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NOVEL DEATH ASSOCIATED PROTEINS, AND PCT/EP02/14027 > 
THAP1 AND PAR4 PATHWAYS IN APOPTOSIS CONTROL ' 

FIELD OF THE INVENTION 

The present invention relates to genes and proteins of the THAP (THanatos (death)- 
Associated Protein) family, and uses thereof. In particular, the invention relates to polypeptides 
comprising a THAP domain, the modulation of THAP-mediated activities and the identification of 
compounds which modulate these activities. 

BACKGROUND 

Coordination of cell proliferation and cell death is required for normal development and 
tissue homeostasis in multicellular organisms. A defect in the normal coordination of these two 
processes is a fundamental requirement for tumorigenesis. 

Progression through the cell cycle is highly regulated, requiring the transit of numerous 
checkpoints (for review, see Hunter, 1993). The extent of cell death is physiologically controlled by 
actuation of a programmed suicide pathway that results in morphologically recognizable form of 
death termed apoptosis (Jacobson et al, 1997; Vaux et aL, 1994). Both extra-cellular signals such 
as tumor necrosis factor, and intracellular signals, like P 53, can induce apoptotic cell death 
Although many proteins involved in apoptosis or the cell cycle have been identified the 
mechanisms by which these two processes are coordinated are not well understood. 

It is well established that molecules which modulate apoptosis have the potential to treat a 
wide range of conditions relating to cell death and cell proliferation. For example, such molecules 
may be used for inducing cell death for the treatment of cancers, inhibiting cell death for the 
treatment of neurodegenerative disorders, and inhibiting or inducing cell death for regulating 
angiogenesis. However, because many biological pathways controlling cell cycle and apoptosis 
have not yet been fully elucidated, there is a need for the identification of biological targets for the 
development of therapeutic molecules for the treatment of these disorders. 
PML nuclear bodies 

PML nuclear bodies (PML-NBs), also known as PODs (PML oncogenic domains) ND10 
(nuclear domain 10) and Kr bodies, are discrete subnuclear domains that are specifically disrupted 
m cells from acute promyelocyte leukemia (APL), a distinct subtype of human myeloid leukemia 
(Maul et al., 2000 ; Ruggero et al., 2000 ; Zhong et al., 2000a). Their name derives from their most 
intensively studied protein component, the promyelocyte leukemia protein (PML), a RING finger 
IFN-mducible protein encoded by a gene originally cloned , as the t(15 ;17) chromosomal 
translocation partner of the retinoic acid receptor (RAR) locus in APL. In APL cells, the presence 
of the leukemogenic fusion protein, PML-RAR, leads to the disruption of PML-NBs and the 
derealization of PML and other PML-NB proteins into aberrant nuclear structures (Zhong et al 
2000a). Treatment of both APL cell lines and patients with retinoic acid, which induces the 
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degradation of the PML-RAR oncoprotein, results in relocalization of PML and other NBs 
components into PML-NBs and complete remission of clinical disease, respectively. The 
deregulation of the PML-NBs by PML-RAR thus appears to play a critical role in tumorigenesis. 
The analysis of mice, where the PML gene was disrupted by homologous recombination, has 
revealed that PML functions as a tumor suppressor in vivo (Wang et al., 1998a), that is essential for 
multiple apoptotic pathways (Wang et al, 1998b). Pml -/- mice and cells are protected from Fas, 
TNFoc, ceramide and IFN-induced apoptosis as well as from DNA damage- induced apoptosis. 
However, the molecular mechanisms through which PML modulates the response to pro-apoptotic 
stimuli are not well understood (Wang et al., 1998b ; Quignon et al., 1998). Recent studies indicate 
that PML can participate in both p53-dependent and p53-independent apoptosis pathways (Guo et 
al, 2000 ; Fogal et al, 2000). p53-dependent DNA-damage induced apoptosis, transcriptional 
activation by p53 and induction of p53 target genes are all impaired in PML -/- primary cells (Guo 
et al., 2000). PML physically interacts with p53 and acts as a transcriptional co-activator for p53. 
This co-activatory role of PML is absolutely dependent on its ability to recruit p53 in the PML-NBs 
(Guo et al, 2000; Fogal et al., 2000). The existence of a cross-talk between PML- and p53- 
dependent growth suppression pathways implies an important role for PML-NBs and PML-NBs- 
associated proteins as modulators of p53 functions. In addition to p53, the pro-apoptotic factor 
Daxx could be another important mediator of PML pro-apoptotic activities (Ishov et al., 1999; 
Zhong et al., 2000b; Li et al., 2000). Daxx was initially identified by its ability to enhance Fas- 
induced cell death. Daxx interacts with PML and localizes preferentially in the nucleus where it 
accumulates in the PML-NBs (Ishov et ah, 1999; Zhong et al., 2000b; Li et al., 2000). Inactivation 
of PML results in derealization of Daxx from PML-NBs and complete abrogation of Daxx pro- 
apoptotic activity (Zhong et al., 2000b). Daxx has recently been found to possess strong 
transcriptional repressor activity (Li et al., 2000). By recruiting Daxx to the PML-NBs, PML may 
inhibit Daxx^mediated transcriptional repression, thus allowing the expression of certain pro- 
apoptotic genes. 

PML-NBs contain several other proteins in addition to Daxx and p53. These include the 
autoantigens SplOO (Sternsdorf et al, 1999) and SplOO-related protein Spl40 (Bloch et al, 1999), 
the retinoblastoma tumor suppressor pRB (Alcalay et al., 1998), the transcriptional co-activator 
CBP (LaMorte et al., 1998), the Bloom syndrome DNA helicase BLM (Zhong et al., 1999) and the 
small ubiquitin-like modifier SUMO-1 (also known as sentrin-1 or PIC1; for recent reviews see 
Yeh et al., 2000; Melchior, 2000; Jentsch and Pyrowolakis, 2P00). Covalent modification of PML 
by SUMO-1 (sumoylation) appears to play a critical role in PML accumulation into NBs (Muller et 
al., 1998) and the recruitment of other NBs components to PML-NBs (Ishov et al., 1999; Zhong et 
al., 2000c). ' 
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Prostate apoptosis response-4 

Prostate apoptosis response-4 (PAR4) is a 38 kDa protein initially identified as the product 
of a gene specifically upregulated in prostate tumor cells undergoing apoptosis (for reviews see 
Rangnekar, 1998 ; Mattson et al., 1999). Consistent with an important role of PAR4 in apoptosis, 
induction of PAR4 in cultured cells is found exclusively during apoptosis and ectopic expression of 
PAR4 in NIH-3T3 cells (Diaz-Meco et al., 1996), neurons (Guo et al., 1998), prostate cancer and 
melanoma cells (Sells et al. s 1997) has been shown to sensitize these cells to apoptotic stimuli. In 
addition, down regulation of PAR4 is critical for ras-induced survival and tumor progression 
(Barradas et al., 1999) and suppression of PAR4 production by antisense technology prevents 
apoptosis in several systems (Sells et al., 1997; Guo et al., 1998), including different models of 
neurodegenerative disorders (Mattson et al., 1999), further emphasizing the critical role of PAR4 in 
apoptosis. At the carboxy terminus, PAR4 contains both a leucine zipper domain (Par4LZ, amino 
acids 290-332), and a partially overlapping death domain (Par4DD, amino acids 258-332). 
Deletion of this carboxy-terminal part abrogates the pro-apoptotic function of PAR4 (Diaz-Meco et 
al., 1996 ; Sells et al, 1997 ; Guo et al., 1998). On the other hand, overexpression of PAR4 leucine 
zipper/death domain acts in a dominant negative manner to prevent apoptosis induced by full-length 
PAR4 (Sells et al., 1997 ; Guo et al, 1998). The PAR4 leucine zipper/death domain mediates 
PAR4 interaction with other proteins by recognizing two different kinds of motifs : zinc fingers of 
the Wilms tumor suppressor protein WT1 (Johnstone et al., 1996) and the atypical isoforms of 
protein kinase C (Diaz-Meco et al., 1996), and an arginine-rich domain from the death-associated- 
protein (DAP)-like kinase Dlk (Page et al., 1999). Among these interactions, the binding of PAR4 
to aPKCs and the resulting inhibition of their enzymatic activity is of particular functional relevance 
because the aPKCs are known to play a key role in cell survival and their overexpression has been 
shown to abrogate the ability of PAR4 to induce apoptosis (Diaz-Meco et al., 1996 ■ Berra et al 
1997). 

SLaCCL21 

Chemokine SLC/CCL21 (also known as SLC, CK0-9, 6Ckine, and exodus-2) is a member 
of the CC (beta>chemokine subfamily. SLC/CCL21 contains the four conserved cysteines 
characteristic of beta chemokines plus two additional cysteines in its unusually long carboxyl- 
terminal domain. Human SLC/CCL21 cDNA encodes a 134 amino acid residue, highly basic, 
precursor protein with a 23 amino acid residue signal peptide that is cleaved to form the predicted 
111 amino acid residues mature protein. Mouse SLC/CCL21 cDNA encodes a 133 amino acid 
residue protein with 23 residue signal .peptide that is cleaved to generate the 110 residue mature 
protein. Human and mouse SLC/CCL21 is highly conserved, exhibiting 86% amino acid sequence 
identity. The gene for human SLC/CCL21 has been localized at human chromosome 9pl3 rather 
than chromosome 17, where the genes of many human CC chemokines are clustered. The 
SLC/CCL21 gene location is within a region of about 100 kb as the gene for MTP-3 
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^L°C?/CC 7 L19 5 another recently identified CC chemokine. SLC/CCL21 was previously known 
to be highly expressed in lymphoid tissues at the mRNA level, and to be a chemoattractant for T 
and B lymphocytes (Nagira, et al. (1997) J. Biol. Chem. 272:19518-19524; Hromas, et al. (1997) J. 
Immunol. 159:2554-2558; Hedrick, et al. (1997) J. Immunol. 159:1589-1593; Gunn, etal. (1998) 
Proc. Natl. Acad. Sci. 95:258-263). SLC/CCL21 also induces both adhesion of lymphocytes to 
intercellular adhesion molecule-1 and arrest of rolling cells (Campbell, et al. (1998) Science 
279:381-384). All of the above properties are consistent with a role for SLC/CCL21 in regulating 
trafficking of lymphocytes through lymphoid tissues. Unlike most CC chemokines, SLC/CCL21 is 
not chemotactic for monocytes. However, it has been reported to inhibit hemopoietic progenitor 
colony formation in a dose-dependent manner (Hromas et al. (1997) J. Immunol. 1 59: 2554-58). 

Chemokine SLC/CCL21 is a ligand for chemokine receptor CCR7 (Rossi et al. (1997) J. 
Immunol. 158:1033; Yoshida et al. (1997) J. Biol. Chem. 272:13803; Yoshida et al. (1998) J. Biol. 
Chem. 273:7118; Campbell et al. (1998) J Cell Biol 141:1053). CCR7 is expressed on T cells and 
dendritic cells (DC), consistent with the chemotactic action of SLC/CCL21 for both lymphocytes 
and mature DC. Both memory (CD45RO+) and naive (CD45RA+) CD4 + and CD8 + T cells express 
the CCR7 receptor (Sallusto et al. (1999) Nature 401:708). Within the memory T cell population, 
CCR7 expression discriminates between T cells with effector function that can migrate to inflamed 
tissues (CCR7 ) vs. T cells that require a secondary stimulus prior to displaying effector functions 
(CCR7 + ) (Sallusto et al. (1999) Nature 401:708). Unlike mature DC, immature DC do not express 
CCR7 nor do they respond to the chemotactic action of CCL21 (Sallusto et al. (1998) Eur. J. 
Immunol. 28:2760; Dieu et al. (1998) J. Exp. Med. 188:373). 

A key function of CCR7 and its two ligands SLC/CCL21 and MIP3b/CCL19 is facilitating 
recruitment and retention of cells to secondary lymphoid organs in order to promote efficient 
antigen exposure to T cells. CCR7-deficient mice demonstrate poorly developed secondary organs 
and exhibit an irregular distribution of lymphocytes within lymph nodes, Peyer's patches, and 
splenic periarteriolar lymphoid sheaths (Forster et al. (1999) Cell 99:23). These animals have 
severely impaired primary T cell responses largely due to the inability of interdigitating DC to 
migrate to the lymph nodes (Forster et al. (1999) Cell 99:23). The overall findings to date support 
the notion that CCR7 and its two ligands, CCL19 and CCL21, are key regulators of T cell responses 
via their control of T cell/DC interactions. CCR7 is an important regulatory molecule with an 
instructive role in determining the migration of cells to secondary lymphoid organs (Forster et al. 
(1999) Cell 99:23; Nakano et al. (1998) Blood 91 :2886). 

SUMMARY OF THE INVENTION 
THAP1 (THanatos-Associated-Protein-1) 

In the past few years, the inventors have focused on the molecular characterization of novel 
genes expressed in the specialized endothelial cells (HEVECs) of post-capillary high endothelial 
venules (Girard and Springer, 1995a; Girard and Springer, 1995b; Girard et al., 1999). In the 
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present invention, they report the analysis of THAP 1 (for THanatos (dea^AssS^eS) a 
protem that localizes to PML-NBs. Two hybrid screening of an HEVEC cDNA library with the 
THAP1 bait lead to the identification of a unique interacting partner, the pro-apoptotic protein 
PAR4. PAR4 is also found to accumulate into PML-NBs and targeting of the THAP1 / PA R 4 
complex to PML-NLs is mediated by PML. Similarly to PAR4, THAP1 is a pro-apoptotic 
polypeptide. Its pro-apoptotic activity requires a novel protein motif in the amino-terminal part 
called THAP domain. Together these results define a novel PML-NBs pathway for apoptosis that 
involves the THAP1/PAR4 pro-apoptotic complex. 

The present invention includes genes, proteins and biological pathways involved in 
apoptosu. In some embodiments, the genes, proteins, and pathways disclosed herein may be used 
for the development of polypeptide, nucleic acid or small molecule therapeutics. 

The present invention provides a novel protein motif, the THAP domain The present 
mventors initially identified the THAP domain as a 90 residue protein motif in the amino-tenninal 
part of THAP1 and which is essential for THAP1 pro-apoptotic activity. THAP1 (THanatos (death) 
Assorted Protein-1), as determined by the present inventors, is a pro-apoptotic polypeptide which 
forms a complex with the pro-apoptotic protein PAR4 and localizes in discrete subnuclear domains 
known as PML nuclear bodies. However, the THAP domain also defines a novel family of 
protems, the THAP family, and the inventors have also provided at least twelve distinct members in 
the human genome (THAP-0 to THAP11), all of which contain a THAP domain (typically 80-90 
ammo acids) in their amino-terminal part. The present invention thus includes nucleic acid 
molecules, including in particular the complete cDNA sequences, encoding members of the THAP 
famrly, portions thereof encoding the THAP domain or polypeptides homologous thereto as well as 
to polypeptides encoded by the THAP family genes. The invention thus also includes diagnostic 
and actmty assays, and uses in therapeutics, for THAP family proteins or portions thereof as well 
as drug screening assays for identifying compounds capable of inhibiting (or stimulating pro- 
apoptotic activity of a THAP family member. 

In one example of a THAP family member, THAP1 is determined to be an apoptosis 
mducmg polypeptide expressed in human endothelial cells (HEVECs), providing characterization 
of the THAP sequences required for apoptosis activity in the THAP1 polypeptide In further 
aspects, the invention is also directed to the interaction of THAP1 with the pro-apoptotic protein 
PAR4 and with PML-NBs, including methods of modulating THAP1 / PAR4 interactions for the 
treatment of disease. The invention also concerns interaction between PAR4 and PML-NBs 
dmgnostics for detection of said interaction" (or localization) and modulation of said interactions for 
the treatment of disease. 

Compounds which modulate interactions between a THAP family member and a THAP- 
famdy target molecule, a THAP domain or THAP-domain target molecule, or a PAR4 and a PML- 
NBs protem may be used in inhibiting (or stimulating) apoptosis of different cell types in various 
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XSiSS diiKsk For example, such compounds may be used to inhibit or stimulate apoptosis of 
endothelial cells in angiogenesis-dependent diseases including but not limited to cancer, 
cardiovascular diseases, inflammatory diseases, and to inhibit apoptosis of neurons in acute and 
chronic neurodegenerative disorders, including but not limited to Alzheimer's, Parkinson's and 
Huntington's diseases, amyotrophic lateral sclerosis, HIV encephalitis, stroke, epileptic seizures). 

Oligonucleotide probes or primers hybridizing specifically with a THAP1 genomic DNA or 
cDNA sequence are also part of the present invention, as well as DNA amplification and detection 
methods using said primers and probes. 

Fragments of THAP family members or THAP domains include fragments encoded by 
nucleic acids comprising at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 
500, or 1000 consecutive nucleotides selected from the group consisting of SEQ ID NOs: 160-175, 
or polypeptides comprising at least 8, 10, 12, 15, 20, 25, 30, 40, 50, 100, 150 or 200 consecutive 
amino acids selected from the group consisting of SEQ ID NOs: 1-1 14. 

A further aspect of the invention includes recombinant vectors comprising any of the 
nucleic acid sequences described above, and in particular to recombinant vectors comprising a 
THAP1 regulatory sequence or a sequence encoding a THAP1 protein, THAP family member, 
THAP domain, fragments of THAP family members and THAP domains, homologues of THAP 
family members/ THAP domains, as well as to cell hosts and transgenic non human animals 
comprising said nucleic acid sequences or recombinant vectors. 

Another aspect of the invention relates to methods for the screening of substances or 
molecules that inhibit or increase the expression of the THAP1 gene or genes encoding THAP 
family members, as well as with methods for the screening of substances or molecules that interact 
with and/or inhibit or increase the activity of a THAP1 polypeptide or THAP family polypeptide. 

In accordance with another aspect, the present invention provides a medicament comprising 
an effective amount of a THAP family protein, e. g. THAP1, or a SLC/CCL21 -binding fragment 
thereof, together with a pharmaceutically acceptable carrier. The medicaments described herein 
may be useful for treatment and/or prophylaxis. 

As related to another aspect the invention is concerned in particular with the use of a THAP 
family protein, homologs thereof and fragments thereof, for example THAP1, or a SLC/CCL21- 
binding fragment thereof as an anti-inflammatory agent. The THAP family protein, for example, 
THAP1 and fragments thereof will be useful for the treatment of conditions mediated by 
SLC/CCL21. 

In a further aspect, the present invention provides a detection method comprising the steps 
of providing a SLC/CCL21 chemokine-binding molecule which is a THAP family protein, for 
example, THAP1, or an SLC/CCL21 -binding fragment thereof, contacting the SLC/CCL21 -binding 
THAP1 molecule with a sample; and detecting an interaction of the SLC/CCL21-binding THAP1 
molecule with SLC/CCL21 chemokine in the sample. 
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In one example, the invention may be used to detect the present of SLaCcLl 
chemokine in a biological sample. The SLC/CCL21 -binding THAP1 molecule may be usefully 
immobilized on a solid support, for example as a THAPl/Fc fusion. 

In accordance with another aspect, the present invention provides a method for inhibiting 
the activity of SLC/CCL2 1 chemokine in a sample, which method comprises contacting the sample 
with an effective amount of a SLC/CCL2 1 chemokine-binding molecule which is a THAP1 protein 
or a SLC/CCL21 -binding fragment thereof. 

In further aspects the invention provides a purified THAP1 protein or a SLC/CCl/M 
binding fragment thereof, or a THAPl/Fc fusion, for use in a method or a medicament as described 
herein; and a kit comprising such a purified THAP1 protein or fragment. 

The invention also envisages the use of fragments of the THAP1 protein, which fragments 
have SLC/CCL21 chemokine-binding properties. The fragments may be peptides derived from the 
protein. Use of such peptides can be preferable to the use of an entire protein or a substantial part 
of a protein, for examp Ie because of the reduced immunogenic^ of a peptide compared to a 
protein. Such peptides may be prepared by a variety of techniques including recombinant DNA 
techniques and synthetic chemical methods. 

It will also be evident that the THAP1 proteins for use in the invention may be prepared in 
a variety of ways, in particular as recombinant proteins in a variety of expression systems Any 
standard systems may be used such as baculovirus expression systems or mammalian cell line 
expression systems. 

Other aspects of the invention are described in the following numbered paragraphs: 
1. A method of identifying a candidate modulator of apoptosis comprising: 

(a) contacting a THAP-family polypeptide or a biologically active fragment thereof 
with a test compound, wherein said THAP-family polypeptide comprises at least 30% amino acid 
identity to an amino acid sequence selected from the group consisting ofSEQ ID NOs: 1-114; and 

(b) determining whether said compound selectively modulates the activity of said 

polypeptide; 

wherein a determination that said test compound selectively modulates the activity of said 
polypeptide indicates that said compound is a candidate modulator of apoptosis. 

2. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
ammo acid sequence of SEQ ID NO 3, or a biologically active fragment thereof. 

3. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
ammo acid sequence of SEQ ID NO 4, or a biologically active fragment thereof. 

4. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
ammo acid sequence of SEQ ID NO 5, or a biologically active fragment thereof. 

5. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
ammo acid sequence of SEQ ID NO 6, or a biologically active fragment thereof. 
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6. The method of Paragraph 1, wherein the THAP-family polypeptide comprises' the 

amino acid sequence of SEQ ED NO 7, or a biologically active fragment thereof 

7. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 8, or a biologically active fragment thereof 

8. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 9, or a biologically active fragment thereof. 

9. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 10, or a biologically active fragment thereof. 

10. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 1 1, or a biologically active fragment thereof. 

11. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 12, or a biologically active fragment thereof 

12. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 13, or a biologically active fragment thereof. 

13. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO 14, or a biologically active fragment thereof. 

14. The method of Paragraph 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence selected from the group consisting of SEQ ID NOs: 15-1 14, and biologically 
active fragments thereof 

15. The method of Paragraph 1, wherein said biologically active fragment of said 
THAP-family protein has at least one biological activity selected from the group consisting of 
interaction with a THAP-family target protein, binding to a nucleic acid sequence, binding to PAR- 
4, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML-NBs, 
targeting a THAP-family target protein to PML-NBs, and inducing apoptosis. 

16. The methods of any one of Paragraphs 2-15 wherein said THAP-family 
polypeptide has at least one biological activity selected from the group consisting of interaction 
with a THAP-family target protein, binding to a nucleic acid sequence, binding to PAR-4, binding 
to PML, binding to a polypeptide found in PML-NBs, localization to PML-NBs, targeting a THAP- 
family target protein to PML-NBs, and inducing apoptosis. 

17. An isolated nucleic acid encoding a polypeptide having apoptotic activity, said 
polypeptide consisting essentially of an amino acid sequence selected from the group consisting of : 

(a) amino acid positions 1-90 of SEQ ID NO : 2, a fragmerit thereof having 
apoptotic activity, or a polypeptide having at least 30% amino acid identity thereto ; 

(b) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 89 of SEQ ID NO 3, a fragment thereof having apoptotic activity, 
or a polypeptide having at least 30% amino acid identity thereto ; 
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(c) a polypeptide comprising a THAP-family domain consisting essentially of 

amino acid positions 1 to 89 of SEQ ID NO 4, a fragment thereof having apoptotic activity, 
or a polypeptide having at least 30% amino acid identity thereto ; 

(d) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 89 of SEQ ID NO 5, a fragment thereof having apoptotic activity, 
or a polypeptide having at least 30% amino acid identity thereto ; 

(e) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ID NO 6, a fragment thereof having apoptotic 
activity or a polypeptide having at least 30% amino acid identity thereto ; 

(f) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ID NO 7, a fragment thereof having apoptotic activity, 
or a polypeptide having at least 30% amino acid identity thereto ; 

(g) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ED NO 8, a fragment thereof having apoptotic 
activity ; or a polypeptide having at least 30% amino acid identity thereto 

(h) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ID NO 9, a fragment thereof having apoptotic activity, 
or a polypeptide having at least 30% amino acid identity thereto ; 

(i) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 92 bf SEQ ID NO 10, a fragment thereof having apoptotic activity 
or a polypeptide having at least 30% amino acid identity thereto ; 

0) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ED NO 11, a fragment thereof having apoptotic 
activity, or a polypeptide having at least 30% amino acid identity thereto ; 

(k) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ED NO 12, or a fragment thereof having apoptotic 
activity, or a polypeptide having at least 30% amino acid identity thereto ; 

(1) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ED NO 13, a fragment thereof having apoptotic 
activity, or a polypeptide having at least 30% amino acid identity thereto ; and 

(m) a polypeptide comprising a THAP-family domain consisting essentially of 
amino acid positions 1 to 90 of SEQ ED NO 14, a fragment thereof having apoptotic 
activity, or a polypeptide having at least 30% amino acid identity thereto. 
18. An isolated nucleic acid encoding a THAP-family polypeptide having apoptotic 
activity selected from the group consisting of: 

(i) a nucleic acid molecule encoding a polypeptide comprising the amino acid 
sequence of a sequence selected from the group consisting of SEQ ED NOs: 1-1 14; 
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(ii) a nucleic acid molecule comprising the nucleic acid sequence of a sequence 

selected from the group consisting of SEQ ED NOs: 160-175 and the sequences complementary 
thereto; and 

(iii) a nucleic acid the sequence of which is degenerate as a result of the genetic 
code to the sequence of a nucleic acid as defined in (i) and (ii). 

19. . The nucleic acid of Paragraph 18, wherein said nucleic acid comprises a nucleic 
acid selected from the group consisting of SEQ ID NOs. 5, 7, 8 and 1 1. 

20. The nucleic acid of Paragraph 18, wherein said nucleic acid comprises a nucleic 
acid selected from the group consisting of SEQ ID NOs. 162, 164, 165 and 168. 

21. An isolated nucleic acid encoding a THAP-family polypeptide having apoptotic 
activity comprising: 

(i) the nucleic acid sequence of SEQ ED NOs : 1-2 or the sequence complementary 

thereto ; or 

(ii) a nucleic acid molecule encoding a polypeptide comprising the amino acid 
sequence of SEQ ID NOs 1-2; 

22. An isolated nucleic acid, said nucleic acid comprising a nucleotide sequence 
encoding: 

i) a polypeptide comprising an amino acid sequence having at least about 80% 
identity to a sequence selected from the group consisting of the polypeptides of SEQ ID NOs: 1-1 14 
and the polypeptides encoded by the nucleic acids of SEQ ID NOs: 160-175 or 

ii) a fragment of said polypeptide which possesses apoptotic activity. 

23. The nucleic acid of Paragraph of Paragraph 23, wherein said nucleic acid encodes a 
polypeptide comprising an amino acid sequence having at least about 80% identity to a sequence 
selected from the group consisting of the polypeptides of SEQ ID NOs: 5, 7, 8 and 1 1 and the 
polypeptides encoded by the nucleic acids of SEQ ID NOs: 162, 164, 165 and 168 or a fragment of 
said polypeptide which possesses apoptotic activity. 

24. The nucleic acid of Paragraph 23, wherein said polypeptide comprises an amino 
acid sequence selected from the group consisting of the sequences of SEQ ID NOs: 5, 7, 8 and 1 1 
and the polypeptides encoded by the nucleic acids of SEQ ID NOs: 162, 164, 165 and 168. 

25. The nucleic acid of Paragraph 23, wherein polypeptide identity is determined using 
an algorithm selected from the group consisting of XBLAST with the parameters score=50 and 
wordlength=3, Gapped BLAST with the default parameters of XBLAST, and BLAST with the 
default parameters of XBLAST. 

26. The nucleic acid of Paragraph 17, wherein said nucleic acid is operably linked to a 
promoter. 

27. * An expression cassette comprising the nucleic acid of Paragraph 26. 

28. A host cell comprising the expression cassette of Paragraph 27. 
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providing a population of host cells comprising a recombinant nucleic acid 
encoding said THAP-family protein of any one of SEQ ID NOs. 1-1 14; and 

culturing said population of host cells under conditions conducive to the expression 
of said recombinant nucleic acid; 

whereby said polypeptide is produced within said population of host cells. 

30. The method of Paragraph 29 wherein said providing step comprises providing a 
population of host cells comprising a recombinant nucleic acid encoding said THAP-family protein 
of any one of SEQ ID NOs. 5, 7, 8 and 11. 

31. The method of Paragraph 29, further comprising purifying said polypeptide from 
said population of cells. 

32. An isolated THAP polypeptide encoded by the nucleic acid of any one of SEQ ID 
Nos. 160-175. 

33. The polypeptide of Paragraph 32, wherein said polypeptide is encoded by a nucleic 
acid selected from the group consisting of SEQ ID NOs. 5, 7, 8, 1 1, 162, 164, 165 and 168. 

34. The polypeptide of Paragraph 32, wherein said polypeptide has at least one activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis. 

35. An isolated THAP polypeptide or fragment thereof, said polypeptide comprising at 
least 12 contiguous amino acids of a sequence selected from the group consisting of SEQ ID NOs- 
1-114. 

36. The polypeptide of Paragraph 35, wherein said polypeptide comprises at least 12 
contiguous amino acids of a sequence selected from the group consisting of SEQ ID NOs. 5, 7, 8, 
and 11. 

37. The polypeptide of Paragraph 35, wherein said polypeptide has at least one activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis. 

38. An isolated THAP polypeptide or fragment thereof, said polypeptide comprising an 
amino acid sequence having at least about 80% amino acid sequence identity to a sequence selected 
from the group consisting of SEQ ID NOs: 1-114 or a fragment thereof, said polypeptide or 
fragment thereof having at least one activity selected from the group consisting of interaction with a 
THAP-family target protein, binding to a nucleic acid sequence, binding to PAR-4, binding to 
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PML, binding to a polypeptide found in PML-NBs, localization to PML-NBs, targeting a THAP- 

family target protein to PML-NBs, and inducing apoptosis. 

39. The polypeptide of Paragraph 38, wherein said THAP polypeptide or fragment 
thereof comprises an amino acid sequence having at least about 80% amino acid sequence identity 
to a sequence selected from the group consisting of SEQ ID NOs: 5, 7, 8 and 1 1 or a fragment 
thereof having at least one activity selected from the group consisting of interaction with a THAP- 
family target protein, binding to a nucleic acid sequence, binding to PAJR-4, binding to PML, 
binding to a polypeptide found in PML-NBs, localization to PML-NBs, targeting a THAP-family 
target protein to PML-NBs, and inducing apoptosis. 

40. The polypeptide of Paragraph 38, wherein said polypeptide is selectively bound by 
an antibody raised against an antigenic polypeptide, or antigenic fragment thereof, said antigenic 
polypeptide comprising the polypeptide of any one of SEQ ID NOs: 1-114. 

41. The polypeptide of Paragraph 38, wherein said polypeptide is selectively bound by 
an antibody raised against an antigenic polypeptide, or antigenic fragment thereof, said antigenic 
polypeptide comprising the polypeptide of any one of SEQ ID NOs: 5, 7, 8 and 1 1. 

42. The polypeptide of Paragraph 38, wherein said polypeptide comprises the 
polypeptide of SEQ ID NOs: 1-1 14. 

43. The polypeptide of Paragraph 38, wherein said polypeptide comprises a 
polypeptide selected from the group consisting of SEQ ID NOs. 5, 7, 8 and 11. 

44. An antibody that selectively binds to the polypeptide of Paragraph 38. 

45. An antibody according to Paragraph 44, wherein said antibody is capable of 
inhibiting binding of said polypeptide to a THAP-family target polypeptide. 

46. An antibody according to Paragraph 44, wherein said antibody is capable of 
inhibiting apoptosis mediated by said polypeptide. 

47. The polyptide of Paragraph 38, wherein identity is determined using an algorithm 
selected from the group consisting of XBLAST with the parameters score=50 and wordlength=3, 
Gapped BLAST with the default parameters of XBLAST, and BLAST with the default parameters 
of XBLAST. 

48. A method of assessing the biological activity of a THAP-family polypeptide 
comprising: 

(a) providing a THAP-family polypeptide or a fragment thereof; and 

(b) assessing the ability of the THAP-family polypeptide to induce apoptosis of a cell. 

49. A method of assessing the biological activity of a THAP-family polypeptide 
comprising: 

(a) providing a THAP-family polypeptide or a fragment thereof; and 

(b) assessing the DNA binding activity of the THAP-family polypeptide. 
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50. The method of Paragraphs 48 or 49, wherein step (a) comprises iSucatg t 0 a 
cell a recombmant vector comprising a nucleic acid encoding a THAP-family polypeptide. 

51. The method of Paragraphs 49 or 50, wherein the THAP-family polypeptide 
comprises a THAP consensus amino acid sequence depicted in SEQ ID NOs : 1-2, or a fragment 
thereof having at least one activity selected from the group consisting of interaction with a THAP- 
family target protein, binding to a nucleic acid sequence, binding to PAR-4, binding to PML 
bindmg to a polypeptide found in PML-NBs, localization to PML-NBs, targeting a THAP-family' 
target protein to PML-NBs, and inducing apoptosis. 

52. The method of Paragraph 49, wherein the THAP-family polypeptide comprises an 
ammo acd sequence selected from the group of sequences consisting of SEQ ID NOs- 1-114 or a 
fragment thereof having at least one activity selected from the group consisting of interaction with a 
THAP-family target protein, binding to a nucleic acid sequence, binding to PAR-4, binding to 
PML, binding to a polypeptide found in PML-NBs, localization to PML-NBs, targeting a THAP- 
family target protein to PML-NBs, and inducing apoptosis. 

53. The method of Paragraph 49, wherein the THAP-family polypeptide comprises a 
native THAP-family polypeptide, or a fragment thereof having at least one activity selected from 
the group consisting of interaction with a THAP-family target protein, binding to a nucleic acid 
sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML-NBs 
localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis. ° 

54. The method of Paragraph 49, wherein the THAP-family polypeptide comprises a 
THAP-farmly polypeptide or a fragment thereof having at least one activity selected from the oroup 
consisting of interaction with a THAP-family target protein, binding to a nucleic acid sequence 
binding to PAR-4, binding to PML, binding to a polypeptide found in PML-NBs, localization to 
PML-NBs, targeting-a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
said THAP-family polypeptide or fragment thereof comprises at least one amino acid deletion 
substitution or insertion. 

55. An isolated THAP-family polypeptide comprising an amino acid sequence of SEQ 
ID NOs: 1-1 14, wherein said polypeptide comprises at least one amino acid deletion, substitution or 
insertion with respect to said amino acid sequence of SEQ ID NOs. 1-114. 

56. A THAP-family polypeptide comprising an amino acid sequence selected from the 
group consisting of SEQ ID NOs: 1-114, wherein said polypeptide comprises at least one amino 
acid deletion, substitution or insertion with respect to said amino acid sequence of one of SEQ ID 
NOs. 1-1 14 and displays a reduced ability to induce apoptosis or bind DNA compared to the wild- 
type polypeptide. 

57. • A THAP-family polypeptide comprising an amino acid sequence of SEQ ID NOs" 
1-1 14, wherein said polypeptide comprises at least one amino acid deletion, substitution or insertion 
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with respect to said amino acid sequence of one of SEQ ED NOs. 1-114 and displays a increased 

ability to induce apoptosis or bind DNA compared to the wild-type polypeptide. 

58. A method of determining whether a THAP-family polypeptide is expressed within 
a biological sample, said method comprising the steps of : 

(a) contacting a biological sample from a subject with: 

a polynucleotide that hybridizes under stringent conditions to a nucleic acid of SEQ ID 
NOs: 160-175 or 

a detectable polypeptide that selectively binds to the polypeptide of SEQ DD NOs: 1-114; 

and 

(b) detecting the presence or absence of hybridization between said polynucleotide and an 
RNA species within said sample, or the presence or absence of binding of said detectable 
polypeptide to a polypeptide within said sample; 

wherein a detection of said hybridization or of said binding indicates that said THAP- 
family polypeptide is expressed within said sample. 

59. The method of Paragraph 58, wherein said subject suffers from, is suspected of 
suffering from, or is susceptible to a cell proliferative disorder. 

60. The method of Paragraph 59, wherein said cell proliferative disorder is a disorder 
related to regulation of apoptosis, 

61. The method of Paragraph 58, wherein said polynucleotide is a primer, and wherein 
said hybridization is detected by detecting the presence of an amplification product comprising said 
primer sequence. 

62. The method of Paragraph 58, wherein said detectable polypeptide is an antibody. 

63. A method of assessing THAP-family activity in a biological sample, said method 
comprising the steps of : 

(a) contacting a nucleic acid molecule comprising a binding site for a THAP-family 
polypeptide with : 

(i) a biological sample from a subject or 

(ii) a THAP-family polypeptide isolated from a biological sample from a subject, the 
polypeptide comprising the amino acid sequences of one of SEQ ID NOs: 1-1 14; and 

(b) assessing the binding between said nucleic acid molecule and a THAP-family 
polypeptide 

wherein a detection of decreased binding compared to a reference THAP-family nucleic 
acid binding level indicates that said sample comprises a deficiency in THAP-family activity. 

64. A method of determining whether a mammal has an elevated or reduced level of 
THAP-family expression; said method comprising the steps of : 

(a) providing a biological sample from said mammal; and 
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(b) comparing the amount of a THAP-family polypeptide of SEQ ID NOs: 1-1 14 or of a 

THAP-family RNA species encoding a polypeptide of SEQ ID NOs: 1-114 within said biological 

sample with a level detected in or expected from a control sample ; 

wherein an increased amount of said THAP-family polypeptide or said THAP-family RNA 

species within said biological sample compared to said level detected in or expected from said 
control sample indicates that said mammal has an elevated level of THAP-family expression, and 
wherein a decreased amount of said THAP-family polypeptide or said THAP-family RNA species 
within said biological sample compared to said level detected in or expected from said control 
sample indicates that said mammal has a reduced level oF THAP-family expression. 

65. The method of Paragraph 64, wherein said mammal suffers from, is suspected of 
suffering from, or is susceptible to a cell proliferative disorder. 

66. A method of identifying a candidate inhibitor of a THAP-family polypeptide, a 
candidate inhibitor of apoptosis, or a candidate compound for the treatment of a cell proliferative 
disorder, said method comprising: 

(a) contacting a THAP-family polypeptide according to SEQ ID NOs: 1-1 14 or a fragment 
comprising a contiguous span of at least 6 contiguous amino acids of a polypeptide according to 
SEQ ID NOs: 1-114 with a test compound; and 

(b) determining whether said compound selectively binds to said polypeptide; 

wherein a determination that said compound selectively binds to said polypeptide indicates 
that said compound is a candidate inhibitor of a THAP-family polypeptide, a candidate inhibitor of 
apoptosis, or a candidate compound for the treatment of a cell proliferative disorder. 

67. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-114 or a fragment comprising a contiguous span of at least 6 
contiguous amino acids of a polypeptide according to SEQ ID NOs: 1-114, said method 
comprising: 

(a) contacting said THAP-family polypeptide with a test compound; and 

(b) determining whether said compound selectively inhibits at least one biological activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis; 

wherein a determination that said compound selectively inhibits said at least one biological 
activity of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment of a cell 
proliferative disorder. 
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68. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 

for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-114 or a fragment comprising a contiguous span of at least 6 
contiguous amino acids of a polypeptide according to SEQ ID NOs: 1-114, said method 
comprising: 

(a) contacting a cell comprising said THAP-family polypeptide with a test compound; and 

(b) determining whether said compound selectively inhibits at least one biological activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis; 

wherein a determination that said compound selectively inhibits said at least one biological 
activity of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment of a cell 
proliferative disorder. 

69. The method of Paragraphs 67 or 68, wherein step (b) comprises assessing apoptotic 
activity, and wherein a determination that said compound inhibits apoptosis indicates that said 
compound is a candidate inhibitor of said THAP-family polypeptide. 

70. The method of Paragraph 68 comprising introducing a nucleic acid comprising the 
nucleotide sequence encoding said THAP-family polypeptide according to any one of Paragraphs 
32-43 into said cell. 

71. A polynucleotide according to any one of Paragraphs 17- 25 attached to a solid 

support. 

72. An array of polynucleotides comprising at least one polynucleotide according to 
Paragraph 71. 

73. An array according to Paragraph 72, wherein said array is addressable. 

74. A polynucleotide according to any one of Paragraphs 17 to 25 further comprising a 

label. 

75. A method of identifying a candidate activator of a THAP-family polypeptide, said 
method comprising : 

a) contacting a THAP-family polypeptide according to SEQ ID NOs: 1-1 14 or a fragment 
comprising a a contiguous span of at least 6 contiguous amino acids of a polypeptide according to 
SEQ ID NOs: 1-1 14 with a test compound; and 

b) determining whether said compound selectively binds to said polypeptide; 

wherein a determination that said compound selectively binds to said polypeptide 
indicates that said compound is a candidate activator of said polypeptide. 
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76. A method of identifying a candidate activator of a THAP-family polypeptide of 

SEQ ID NOs: 1-1 14 or a fragment comprising a a contiguous span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-1 14, said method comprising: 

(a) contacting said polypeptide with a test compound; and 

(b) determining whether said compound selectively activates at least one biological activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis; 

wherein a determination that said compound selectively activates said at least one 
biological activity of said polypeptide indicates that said compound is a candidate activator of said 
polypeptide. 

77. A method of identifying a candidate activator of a THAP-family polypeptide of 
SEQ ID NOs: 1-1 14 or, a fragment comprising a a contiguous span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-1 14, said method comprising: 

(a) contacting a cell comprising said THAP-family polypeptide with a test compound; and 

(b) determining whether said compound selectively activates at least one biological activity 
selected from the group consisting of interaction with a THAP-family target protein, binding to a 
nucleic acid sequence, binding to PAR-4, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis; 

wherein a determination that said compound selectively activates said at least one 
biological activity of said polypeptide indicates that said compound is a candidate activator of said 
polypeptide. 

78. The method of Paragraphs 76 or 77, wherein said determining step comprises 
assessing apoptotic activity, and wherein a determination that said compound increases apoptosis 
activity indicates that said compound is a candidate activator of said THAP-family polypeptide. 

79. The method of Paragraph 77 wherein step a) comprises introducing a nucleic acid 
comprising the nucleotide sequence encoding said THAP-family polypeptide according to any one 
of Paragraphs 17-25 into said cell. 

80. A method of identifying a candidate modulator of PAR4 activity, said method 
comprising: 

(a) providing a PAR4 polypeptide or a fragment thereof; and 

(b) providing a PML-NB polypeptide, or a polypeptide associated with PML-NBs, or a 
fragment thereof; and 

(c) determining whether a test compound selectively modulates the ability of said PAR4 
polypeptide to bind to said PML-NB polypeptide or polypeptide associated with PML-NBs; 
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wherein a determination that said test compound selectively inhibits the ability of said 

PAR4 polypeptide to bind to said PML-NB polypeptide or polypeptide associated with PML-NBs 

indicates that said compound is a candidate modulator of PAR4 activity. 

81. A method of identifying a candidate modulator of PAR4 activity, said method 
comprising: 

(a) providing a PAR4 polypeptide or a fragment thereof; and 

(b) determining whether a test compound selectively modulates the ability of said PAR4 
polypeptide to localise in PML-NBs; 

wherein a determination that said test compound selectively inhibits the ability of said 
PAR4 polypeptide to localise in PML-NBs indicates that said compound is a candidate modulator 
of PAR4 activity. 

82. A method of identifying a candidate inhibitor of THAP-family activity, said 
method comprising: 

(a) providing a THAP-family polypeptide of SEQ ID NOs: 1-1 14 or, a fragment comprising 
a a contiguous span of at least 6 contiguous amino acids of a polypeptide according to SEQ ID 
NOs: 1-114; and 

(b) providing a THAP-family target polypeptide or a fragment thereof; and 

(c) determining whether a test compound selectively inhibits the ability of said THAP- 
family polypeptide to bind to said THAP-family target polypeptide; 

wherein a determination that said test compound selectively inhibits the ability of said 
THAP-family polypeptide to bind to said THAP-family target polypeptide indicates that said 
compound is a candidate inhibitor of THAP-family activity. 

83. The method of Paragraph 82, comprising providing a cell comprising: 

(a) a first expression vector comprising a nucleic acid encoding a THAP-family polypeptide 
of SEQ ID NOs: 1-114 or, a fragment comprising a a contiguous span of at least 6 contiguous 
amino acids of a polypeptide according to SEQ ID NOs: 1-1 14; and 

(b) a second expression vector comprising a nucleic acid encoding a THAP-family target 
polypeptide, or a fragment thereof. 

84. The method of Paragraph 82, wherein said THAP-family activity is apoptosis 

activity. 

85. The method of Paragraph 82, wherein said THAP-family target protein is PAR-4. 

86. The method of Paragraph 82, wherein said THAP-family polypeptide is a THAP-1 , 
THAP-2 or THAP-3 protein and said THAP-family target protein is PAR-4. 

87. A method of modulating apoptosis in a cell comprising modulating the activity of a 
THAP-family protein. 

88. The method of Paragraph 87, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-114. 
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89. A method of modulating apoptosis in a cell comprising modulating the recruitment 
of PAR-4 to a PML nuclear body. 

90. The method of Paragraph 89 wherein modulating the activity of a THAP-family 
protein comprises modulating the interaction of a THAP-family protein and a THAP-family target 
protein. 

91. The method of Paragraph 89 wherein modulating the activity of a THAP-family 
protein comprises modulating the interaction of a THAP-family protein and a PAR4 protein. 

92. The method of Paragraph 91 comprising modulation the interaction between a 
THAP-1, THAP-2, or THAP-3 protein and a PAR-4 protein. 

93. A method of modulating the recruitment of PAR-4 to a PML nuclear body 
comprising modulating the interaction of said PAR-4 protein and a THAP-family protein. 

94. The method of Paragraph 93, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-114. 

95. A method of modulating angiogenesis in an individual comprising modulating the 
activity of a THAP-family protein in said individual. 

96. The method of Paragraph 95, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-1 14. 

97. A method of preventing cell death in an individual comprising inhibiting the 
activity of a THAP-family protein in said individual. 

98. The method of Paragraph 97, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-114. 

99. The method according to Paragraph 97, wherein the activity of said THAP-family 
protein is inhibited in the CNS. 

100. A method of inducing angiogenesis in an individual comprising inhibiting the 
activity of a THAP-family protein in said individual. 

101. The method of Paragraph 100, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-1 14. 

102. A method according to Paragraph 100, wherein the activity of said THAP-family 
protein is inhibited in endothelial cells. 

103. A method of inhibiting angiogenesis or treating cancer in an individual comprising 
increasing the activity of a THAP-family protein in said individual. 

104. The method of Paragraph 103, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-114. 

105. A method of treating inflammation or an inflammatory disorder in an individual 
comprising increasing the activity of a THAP-family protein in said individual. 

106. The method of Paragraph 105, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-114. 
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107. A method according to Paragraphs 103 or 105, wherein the activity of said THAP- 
family protein is increased in endothelial cells. 

108. A method of treating cancer in an individual comprising increasing the activity of a 
THAP-family protein in said individual. 

109. The method of Paragraph 108, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs. 1-1 14. 

1 10. The method of Paragraph 108, wherein increasing the activity of said THAP family 
protein induces apoptosis, inhibits cell division, inhibits metastatic potential, reduces tumor burden, 
increases sensitivity to chemotherapy or radiotherapy, kills a cancer cell, inhibits the growth of a 
cancer cell, kills an endothelial cell, inhibits the growth of an endothelial cell, inhibits angiogenesis, 
or induces tumor regression. 

111. A method according to any one of Paragraphs 87-1 10, comprising contacting said 
subject with a recombinant vector encoding a THAP-family protein according to any one of 
Paragraphs 32-43 operably linked to a promoter that functions in said cell. 

112. The method of Paragraph 111, wherein said promoter functions in an endothelial 

cell. 

113. A viral composition comprising a recombinant viral vector encoding a THAP- 
family protein according to Paragraphs 32-43. 

114. The composition of Paragraph 113, wherein said recombinant viral vector is an 
adenoviral, adeno-associated viral, retroviral, herpes viral, papilloma viral, or hepatitus B viral 
vector. 

115. A method of obtaining a nucleic acid sequence which is recognized by a THAP- 
family polypeptide comprising contacting a pool of random nucleic acids with said THAP-family 
polypeptide or a portion thereof and isolating a complex comprising said THAP-family polypeptide 
and at least one nucleic acid from said pool. 

116. The method of Paragraph 115 wherein said pool of nucleic acids are labeled. 

117. The method of Paragraph 116 wherein said complex is isolated by performing a gel 
shift analysis. 

118. A method of identifying a nucleic acid sequence which is recognized by a THAP- 
family polypeptide comprising: 

(a) incubating a THAP-family polypeptide with a pool of labeled random nucleic 

acids; 

(b) isolating a complex between said THAP-family polypeptide and at least one 
nucleic acid from said pool; 

(c) performing an amplification reaction to amplify the at least one nucleic acid 
present in said complex; 
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(d) incubating said at least one amplified nucleic acid with said THAP-family 
polypeptide; 

(e) isolating a complex between said at least one amplified nucleic acid and said 
THAP-family polypeptide; 

(f) repeating steps (c), (d) and (e) a plurality of times; 

(g) determining the sequence of said nucleic acid in said complex. 

119. A method of identifying a compound which inhibits the ability of a THAP-family 
polypeptide to bind to a nucleic acid comprising :incubating a THAP-family polypeptide or a 
fragment thereof which recognizes a binding site in a nucleic acid with a nucleic acid containing 
said binding site in the presence or absence of a test compound and determining whether the level 
of binding of said THAP-family polypeptide to said nucleic acid in the presence 'of said test 
compound is less than the level of binding in the absence of said test compound. 

120. A method of identifying a test compound that modulates THAP-mediated activities 
comprising: 

contacting a THAP-family polypeptide or a biologically active fragment thereof 
with a test compound, wherein said THAP-family polypeptide comprises an amino acid 
sequence having at least 30% amino acid identity to an amino acid sequence of SEQ ID 
NO: 1; and 

determining whether said test compound selectively modulates the activity of said 
THAP-family polypeptide or biologically active fragment thereof, wherein a determination 
that said test compound selectively modulates the activity of said polypeptide indicates that 
said test compound is a candidate modulator of THAP-mediated activities. 

121. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 1, or a biologically active fragment thereof. 

122. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 2, or a biologically active fragment thereof. 

123. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the ammo acid sequence of SEQ ID NO: 3, or a biologically active fragment thereof. 

124. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 4, or a biologically active fragment thereof. 

125. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 5, or a biologically active fragment thereof. 

126. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 6, or a biologically active fragment thereof. 

127. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the ammo acid sequence of SEQ ID NO: 7, or a biologically active fragment thereof. 
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128. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 

the amino acid sequence of SEQ ID NO: 8, or a biologically active fragment thereof. 

129. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 9, or a biologically active fragment thereof. 

130. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 10, or a biologically active fragment thereof. 

131. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 1 1, or a biologically active fragment thereof. 

132. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 12, or a biologically active fragment thereof. 

133. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 13, or a biologically active fragment thereof. 

134. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence of SEQ ID NO: 14 or a biologically active fragments thereof 

135. The method of Paragraph 120, wherein the THAP-family polypeptide comprises 
the amino acid sequence selected from the group consisting of SEQ ID NOs: 15-114 or a 
biologically active fragments thereof 

136. The method of Paragraph 120, wherein said THAP-mediated activity is selected 
from the group consisting of interaction with a THAP-family target protein, binding to a nucleic 
acid, binding to PAR-4, binding to SLC, binding to PML, binding to a polypeptide found in PML- 
NBs, localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis 

137. The method of Paragraph 136, wherein said THAP-mediated activity is binding to 

PAR-4. 

138. The method of Paragraph 136, wherein said THAP-mediated activity is binding to 

SLC. 

139. The method of Paragraph 136, wherein said THAP-mediated activity is inducing 
apoptosis. 

140. The method of Paragraph 136, wherein said nucleic acid comprises a nucleotide 
sequence selected from the group consisting of SEQ ID NOs: 140-159. 

141. The method of Paragraph 120, wherein said amino acid identity is determined using 
an algorithm selected from the group consisting of XBLAST with the parameters, score=50 and 
wordlength=3, Gapped BLAST with the default parameters of XBLAST, and BLAST with the 
defaul parameters of XBLAST. 

142. An isolated or purified THAP domain polypeptide consisting essentially of an 
amino acid sequence selected from the group consisting of SEQ ID NOs: 1-2, amino acids 1-89 of 
SEQ ID NOs: 3-5, amino acids 1-90 of SEQ ID NOs: 6-9, amino acids 1-92 of SEQ ID NO: 10, 
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ammo acids 1-90 of SEQ ID NOs: 1 1-14 and homologs having at least 30% amino acid 'identity to 
any aforementioned sequence, wherein said polypeptide binds to a nucleic acid. 

143. The isolated or purified THAP domain polypeptide of Paragraph 142 consistine 
essentially of SEQ ID NO: 1 . 

144. The isolated or purified THAP domain polypeptide of Paragraph 142, wherein said 
amino acid identity is determined using an algorithm selected from the group consisting of 
XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the default 
parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

145. The isolated or purified THAP domain polypeptide of Paragraph 142, wherein said 
nucleic acid comprises a nucleotide sequence selected from the group consisting of SEQ ID NOs- 
140-159. 

146. An isolated or purified nucleic acid which encodes the THAP domain polypeptide 
of Paragraph 142 or a complement thereof. 

147. An isolated or purified PAR4-binding domain polypeptide consisting essentially of 
an amino acid sequence selected from the group consisting of amino acids 143-192 of SEQ ID NO: 
3, amino acids 132-181 of SEQ ID NO. 4, amino acids 186-234 of SEQ ID NO: 5 , SEQ ID NO: 
15 and homologs having at least 30% amino acid identity to any aforementioned sequence, wherein 
said polypeptide binds to PAR4. 

148. The isolated or purified PAR4-binding domain of Paragraph 147 consisting 
essentially of SEQ ID NO: 1 5. 

149. The isolated or purified PAR4-binding domain of Paragraph 147 consisting 
essentially of amino acids 143-193 of SEQ ID NO: 3. 

150. The isolated or purified PAR4-binding domain of Paragraph 147 consisting 
essentially of amino acids 132-181 of SEQ ED NO: 4. 

151. The isolated or purified PAR4-binding domain of Paragraph 147 consisting 
essentially of amino acids 186-234 of SEQ ID NO: 5. 

152. The isolated or purified PAR4-binding domain polypeptide of Paragraph 147, 
wherein said amino acid identity is determined using an algorithm selected from the group 
consisting of XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the 
default parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

153. An isolated or purified nucleic acid which encodes the PAR4-binding domain 
polypeptide of Paragraph 147 or a complement thereof. 

154. An isolated or purified SLC-binding domain polypeptide consisting essentially of 
an amino acid sequence selected from the group consisting of amino acids 143-213 of SEQ ID NO: 
3 and homologs thereof having at least 30% amino acid identity, wherein said polypeptide binds to 
SLC. 
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155. The isolated or purified SLC-binding domain polypeptide of Paragraph 154, 

wherein said amino acid identity is determined using an algorithm selected from the group 
consisting of XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the 
default parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

156. An isolated or purified nucleic acid which encodes the SLC-binding domain 
polypeptide of Paragraph 154 or a complement thereof. 

157. A fusion protein comprising an Fc region of an immunoglobulin fused to a 
polypeptide comprising an amino acid sequence selected from the group consisting of amino acids 
143-213 of SEQ ID NO: 3 and homologs thereof having at least 30% amino acid identity. 

158. An oligomeric THAP protein comprising a plurality of THAP polypeptides, 
wherein each THAP polypeptide comprises an amino acid sequence selected from the group 
consisting of amino acid 143-213 of SEQ ID NO: 3 and homologs thereof having at least 30% 
amino acid identity. 

159. A medicament comprising an effective amount of a THAP1 polypeptide or an 
SLC-binding fragment thereof, together with a pharmaceutical^ acceptable carrier. 

160. An isolated or purified THAP dimerization domain polypeptide consisting 
essentially of an amino acid sequence selected from the group consisting of amino acids 143 and 
192 of SEQ ID NO: 3 and homologs thereof having at least 30% amino acid identity, wherein said 
polypeptide binds to a THAP-family polypeptide.. 

161. The isolated or purified THAP dimerization domain polypeptide of Paragraph 160, 
wherein said amino acid identity is determined using an algorithm selected from the group 
consisting of XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the 
default parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

162. An isolated or purified nucleic acid which encodes the THAP dimerization domain 
polypeptide of Paragraph 160 or a complement thereof. 

163. An expression vector comprising a promoter operably linked to a nucleic acid 
having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 160-175 and 
portions thereof comprising at least 18 consecutive nucleotides. 

164. The expression vector of Paragraph 163, wherein said promoter is a promoter 
which is not operably linked to said nucleic acid selected from the group consisting of SEQ ID 
NOs.: 160-175 in a naturally occurring genome. 

165. A host cell comprising the expression vector of Paragraph 163. 

166. An expression vector comprising a promoter operably linked to a nucleic acid 
encoding a polypeptide comprising an amino acid sequence selected from the group consisting of 
SEQ ID NOs: 1-114 and portions thereof comprising at least 18 consecutive nucleotides. 
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167. The expression vector of Paragraph 166, wherein said promoter is a promoter 
which is not operably linked to said nucleic acid selected from the group consisting of SEQ ID 
NOs.: 160-175 in a naturally occurring genome. 

1 68. A host cell comprising the expression vector of Paragraph 1 66. 

169. A method of identifying a candidate inhibitor of a THAP-family polypeptide a 
candidate inhibitor of apoptosis, or a candidate compound for the treatment of a cell proliferative 
disorder, said method comprising: 

contacting a THAP-family polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-1 14 or a fragment comprising a span 
of at least 6 contiguous amino acids of a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-1 14 with a test compound; and 

determining whether said compound selectively binds to said polypeptide, wherein 
a determination that said compound selectively binds to said polypeptide indicates that said 
compound is a candidate inhibitor of a THAP-family polypeptide, a candidate inhibitor of 
apoptosis, or a candidate compound for the treatment of a cell proliferative disorder. 
170. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-1 14 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-1 14, said method comprising: 

contacting said THAP-family polypeptide with a test compound; and 
determining whether said compound selectively inhibits at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively inhibits said at least one biological activity 
of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

171. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-1 14 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-114, said method comprising: 

contacting a cell comprising said THAP-family polypeptide with a test compound; 

and 

determining whether said compound selectively inhibits at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
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protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 

PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively inhibits said at least one biological activity 
of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

172. A method of identifying a candidate modulator of THAP-family activity, said 
method comprising: 

providing a THAP-family polypeptide of SEQ ID NOs: 1-114 or, a fragment 
comprising a span of at least 6 contiguous amino acids of a polypeptide according to SEQ 
ID NOs: 1-114; and 

providing a THAP-family target polypeptide or a fragment thereof; and 
determining whether a test compound selectively modulates the ability of said 
THAP-family polypeptide to bind to said THAP-family target polypeptide, wherein a 
determination that said test compound selectively modulates the ability of said THAP- 
family polypeptide to bind to said THAP-family target polypeptide indicates that said 
compound is a candidate modulator of THAP-family activity. 

173. The method of Paragraph 172, wherein said THAP-family polypeptide is provided 
by a first expression vector comprising a nucleic acid encoding a THAP-family polypeptide of SEQ 
ID NOs: 1-1 14 or, a fragment comprising a contiguous span of at least 6 contiguous amino acids of 
a polypeptide according to SEQ ID NOs: 1-1 14, and wherein said THAP-family target polypeptide 
is provided by a second expression vector comprising a nucleic acid encoding a THAP-family 
target polypeptide, or a fragment thereof 

174. The method of Paragraph 172, wherein said THAP-family activity is apoptosis 

activity. 

175. The method of Paragraph 172, wherein said THAP-family target protein is PAR-4. 

176. The method of Paragraph 172, wherein said THAP-family polypeptide is a THAP- 
1, THAP-2 or THAP-3 protein and said THAP-family target protein is PAR-4. 

177. The method of Paragraph 172, wherein said THAP-family target protein is SLC. 

178. A method of modulating apoptosis in a cell comprising modulating the activity of a 
THAP-family protein. 

179. The method of Paragraph 178, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs: 1-114. 

180. The method of Paragraph 178, wherein modulating the activity of a THAP-family 
protein comprises modulating the interaction of a THAP-family protein and a THAP-family target 
protein. 
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181. The method of Paragraph 178, wherein modulating the activity of a T^fLlv 
protein comprises modulating the interaction of a THAP-family protein and a PAR4 protein 

182. A' method of identifying a candidate activator of a THAP-family polypeptide a 
candidate activator of apoptosis, or a candidate compound for the treatment of a cell proliferative 
disorder, said method comprising: 

contacting a THAP-family polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-98 or a fragment comprising a span 
of at least 6 contiguous amino acids of a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-98 with a test compound; and 

determining whether said compound selectively binds to said polypeptide wherein 
a determmation that said compound selectively binds to said polypeptide indicates that said 
compound is a candidate activator of a THAP-family polypeptide, a candidate activator of 
apoptosis, or a candidate compound for the treatment of a cell proliferative disorder 
183. A method of identifying a candidate activator of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate activator of a THAP-famil y 
polypeptide of SEQ ID NOs: 1-98 or a fragment comprising a span of at least 6 contiguous ammo 
acids of a polypeptide according to SEQ ID NOs: 1-98, said method comprising: 

contacting said THAP-family polypeptide with a test compound; and 
determining whether said compound selectively activates at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively activates said at least one biological activity 
of said polypeptide indicates that said compound is a candidate activator of a THAP-family 
polypeptide, a candidate activator of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

184. A method of identifying a candidate activator of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate activator of a THAP-familv 
polypeptide of SEQ ID NOs: 1 to 98 or a fragment comprising a span of at least 6 contiguous amino 
acids ofa polypeptide according to SEQ ID NOs: 1-98, said method comprising: 

contacting a cell comprising said THAP-family polypeptide with a test compound; 

determining whether said compound selectively activates at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
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NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 

a determination that said compound selectively activates said at least one biological activity 

of said polypeptide indicates that said compound is a candidate activator of a THAP-family 

polypeptide, a candidate activator of apoptosis, or a candidate compound for the treatment 

of a cell proliferative disorder. 

185. A method of ameliorating a condition associated with the activity of SLC in an 
individual comprising administering a polypeptide comprising the SLC binding domain of a THAP- 
family protein to said individual. 

186. The method of Paragraph 185, wherein said polypeptide comprises a fusion protein 
comprising an Fc region of an immunoglobulin fused to a polypeptide comprising an amino acid 
sequence selected from the group consisting of amino acids 143-213 of SEQ ID NO: 3 and 
homologs thereof having at least 30% amino acid identity. 

187. The method of Paragraph 185, wherein said polypeptide comprises an oligomeric 
THAP protein comprising a plurality of THAP polypeptides, wherein each THAP polypeptide 
comprises an amino acid sequence selected from the group consisting of amino acid 143-213 of 
SEQ ID NO: 3 and homologs thereof having at least 30% amino acid identity. 

188. A method of modulating angiogenesis in an individual comprising modulating the 
activity of a THAP-family protein in said individual. 

189. The method of Paragraph 188, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs: 1-1 14. 

190. The method of Paragraph 188, wherein said modulation is inhibition. 

191. The method of Paragraph 188, wherein said modulation is induction. 

192. A method of reducing cell death in an individual comprising inhibiting the activity 
of a THAP-family protein in said individual. 

193. The method of Paragraph 192, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs: 1-1 14. 

194. The method according to Paragraph 192, wherein the activity of said THAP-family 
protein is inhibited in the CNS. 

195. A method of reducing inflammation or an inflammatory disorder in an individual 
comprising modulating the activity of a THAP-family protein in said individual. 

196. The method of Paragraph 195, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs: 1-1 14. 

197. A method of reducing the extent of cancer in an individual comprising modulating 
the activity of a THAP-family protein in said individual. 

198. The method of Paragraph 197, wherein said THAP-family protein is selected from 
the group consisting of SEQ ID NOs: 1-114. 
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1«. The method of Paragraph 197, wherein increasing the activity SSSSSLly 
protein induces apoptosis, inhibits cell division, inhibits metastatic potential, reduces tumor burden 
mcreases sensitivity to chemotherapy or radiotherapy, kills a cancer cell, inhibits the growth of a 
cancer cell, kills an endothelial cell, inhibits the growth of an endothelial cell, inhibits angiogenesis 
or induces tumor regression. 

BRIEF DESCRIPTION OF THE PR A WTTV^fi 

Figure 1 A illustrates an amino acid sequence alignment of human THAP1 (hTHAPl) (SEO 
ID NO: 3) and mouse THAP1 (mTHAPl) (SEQ ID NO: 99) orthologous polypeptide, Identical 
ammo acid residues are indicated with an asterisk. 

Figure IB depicts the primary structure of the human THAP1 polypeptide. Positions of the 
THAP domain, the proline-rich region (PRO) and the bipartite nuclear localization sequence (NLS) 
are indicated. v J 

Figure 2 depicts the results of a Northern Blot analysis of THAP1 mRNA expression in n 
human tissue, Each lane contains 2 Mg of poly A+ RNA isolated from the indicated human tissues 
The blot was hybridized, under high-stringency conditions, with a 32 P . labe]ed TRApi cDNA 
probe, and exposed at -70°C for 72 hours. 

Figure 3A illustrates the interaction between THAP1 and PAR4 in a yeast two-hybrid 
system. In particular, THAP 1 binds to wild-type Par4 (Par4) and the leucine zipper-containing Par4 
death domain (Par4DD) (amino acids 250-342 of PAR4) but not a Par4 deletion mutant lacking the 
death domain (PAR4A) (amino acids 1-276 of PAR4). A ( + ) indicates binding whereas a (-) 
indicated lack of binding. ~ 

Figure 3B shows the binding of in vitro translated, 35 S-methionine-labeled THAP1 to a 
GST-Par4DD polypeptide fusion. Par4DD was expressed as a GST fusion protein then purified on 
an affimty matrix of glutathione sepharose. GST served as negative control. The input represents 
1/10 of the material used in the binding assay. 

Figure 4A illustrates the interaction between PAR4 and several THAPl deletion mutants 

TlZZ tr ° ^ ^ ViV °' ^ THAP1 de,eti ° n — ^ teSt6d fOT bindi *S * -ther PAR or 
PAR4DD m a yeast two hybrid system (two hybrid bait), to PAR4DD in GST pull down assays (in 

vitro) and to myc-Par4DD in primary human endothelial cells (,„ v/vo). A ( + ) indicates binding 
whereas a (-) indicated lack of binding. 

Figure 4B shows the binding of several in vitro translated, "S-memiomne-labeled THAPl 
deletion mutants to a GST-Par4DD polypeptide fusion. Par4DD was expressed as a GST fusion 
protein then purified on an affinity matrix of glutathione sepharose. GST served as negative 
control. The input represents 1/10 ofthe material used in the binding assay. 

5A d6piCtS M ^ acid se 1 uence ***** ofthe Par4 binding domain of human 
THAPl (SEQ ID NO: 1 17) and mouse THAPl (SEQ E> NO: 1 16) orthologues with that of mouse 
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ZIP kinase (SEQ ID NO: 115), another Par4 binding partner. An arginine-rich consensus Par4 

binding site (SEQ ID NO: 15), derived from this alignment, is also indicated. 

Figure 5B shows the primary structure of the THAP1 wild-type polypeptide and two 

THAP1 mutants (THAP 1 A(QRCRR) and THAP1 RR/AA). THAP 1 A(QRCRR) is a deletion 

mutant having a deletion of amino acids at positions 168-172 of THAP1 (SEQ ID NO: 3) whereas 

THAP RR/AA is a mutant having the two arginines located at amino acid positions 171 and 172 to 

THAP1 (SEQ ID NO: 3) replaced with alanines. Results obtained, in yeast two-hybrid system with 

Par4 and Par4DD baits (two hybrid bait), in GST pull down assays with GST-Par4DD (in vitro) and 

in the in vivo interaction test with myc-Par4DD in primary human endothelial cells (in vivo) are 

summarized. 

Figure 6A is a graph which compares apoptosis levels in cells transfected with GFP- 
APSK1, GFP-Par4 or GFP-THAP1 expression vectors. Apoptosis was quantified by DAPI staining 
of apoptotic nuclei, 24 h after serum-withdrawal. Values are the means of three independent 
experiments. 

Figure 6B is a graph which compares apoptosis levels in cells transfected with GFP-APSK1 
or GFP-THAP1 expression vectors. Apoptosis was quantified by DAPI staining of apoptotic 
nuclei, 24 h after addition of TNF a. Values are the means of three independent experiments. 

Figure 7 A shows the binding of in vitro translated 35 S-methionine labeled THAP1 (wt) or 
THAP1ATHAP (A) to a GST-Par4DD polypeptide fusion. Par4DD was expressed as a GST fusion 
protein then purified on an affinity matrix of glutathione sepharose. GST served as negative 
control. The input represents 1/10 of the material used in the binding assay. 

Figure 7B is a graph which compares the proapoptotic activity of THAP1 with a THAP1 
mutant having its THAP domain (amino acids 1-90 of SEQ ID NO: 3) deleted. The percentage of 
apoptotic cells in mouse 3T3 fibroblasts overexpressing GFP-APSK1 (control), GFP-THAP1 
(THAP1) or GFP-THAP 1 ATHAP (THAP 1 ATHAP) was determined by counting apoptotic nuclei 
after DAPI staining. Values are the means of three independent experiments. 

Figure 8 depicts the primary structure of twelve human THAP proteins. The THAP domain 
(colored grey) is located at the amino-terminus of each of the twelve human THAP proteins. The 
black box in THAP1, THAP2 and THAP3 indicates a nuclear localization sequence, rich in basic 
residues, that is conserved in the three proteins. The number of amino-acids in each THAP protein 
is indicated; (*) indicates the protein is not full length. 

Figure 9A depicts an amino acid sequence alignment of the THAP domain of human 
THAP1 (hTHAPl, SEQ ID NO: 123) with the DNA binding domain of drosophila melanogaster P- 
element transposase (dmTransposase, SEQ ID NO: 124). Identical residues are boxed in black and 
conserved residues in grey. A THAP domain consensus sequence (SEQ ID NO: 125) is also 
shown. 
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Mgure 9B depicts an amino acid sequence alignment of the THAP^oSfof °t^ e lv e 
members of the human THAP family (hTHAPl, SEQ ID NO: 126; hTHAP2, SEQ ID NO- 131- 
hTHAP3, SEQ ID NO: 127; hTHAP4, SEQ ID NO: 130; hTHAPS, SEQ ID NO: 128- hTHAP 6 ' 
SEQ ID NO: 135; hTHAP7, SEQ ID NO: 133; hTHAPS, SEQ ID NO: 129; hTHAP9, SEQ ID NO- 
134; hTHAPIO, SEQ ID NO: 137; hTHAPl 1, SEQ ID NO: 136; hTHAPO, SEQ ID NO: 132) with 
the DNA binding domain of Drosophila melanogaster P-element transpose (dmTransposae SEQ 
ID NO: 138). Residues conserved among at least seven of the thirteen sequences are boxed Black 
boxes indicate identical residues whereas boxes shaded in grey show similar amino acids Dashed 
lines represent gaps introduced to align sequences. A THAP domain consensus sequence (SEQ ID 
NO: 139) is also shown. 

Figure 9C depicts an amino acid sequence alignment of 95 distinct THAP domain 
sequences, including hTHAPl through hTHAPl 1 and hTHAPO (SEQ ID NOs- 3-14 listed 
sequentially beginning from the top), with 83 THAP domains from other species (SEQ ID NOs- 1- 
98, listed sequentially beginning at the sequence denoted sTHAPl and ending at the sequence 
denoted ceNP_498747.1), which were identified by searching GenBank genomic and EST 
databases with the human THAP sequences. Residues conserved among at least 50% of the 
sequences are boxed. Black boxes indicate identical residues whereas boxes shaded in grey show 
similar amino acids. Dashed lines represent gaps introduced to align sequences. The species are 
indicated: Homo sapiens (h); Sus scrofa (s); Bos taunts (b); Mus musculus (m); Rattus norvegicus 
(r); Gallusgallvs (g); Xenopus laevi (x); Danio rerio (z); Oryzias latipes(o); Drosophila 
melanogaster (dm); Anopheles gambiae^; Bombyx mori (bm); Caenorhabditis.elegans (ce) A 
consensus sequence (SEQ ID NO: 2) is also shown. Amino acids underlined in the consensus 
sequence are residues which are conserved in all 95 THAP sequences. 

Figure 10A shows an amino acid sequence alignment of the human THAP1 (SEQ ID NO" 
3), THAP2 (SEQ ID NO: 4) and THAP3 (SEQ ID NO: 5) protein sequence, Residues conserved 
among at least two of the three sequences are boxed. Black boxes indicate identical residues 
whereas boxes shaded in grey show similar amino acids. Dashed lines represent gaps introduced to 
align sequence, Regions corresponding to the THAP domain, the PAR4-binding domain and the 
nuclear localization signal (NLS) are also indicated. 

Figure 10B shows the primary structure of human THAP1, THAP2 and THAP3 and results 
of two-hybnd interactions between each THAP protein and Par4 or Par4 death domain (Par4DD) in 
the yeast two hybrid system. 

Figure 10C shows the binding of in vitro translated, 35 S-methionine-labeled THAP2 and 
THAP3 to a GST-Par4DD polypeptide fusion. Par4DD was expressed as a GST fusion protein then 
purified on an affinity matrix of glutathione sepharose. GST served as negative control. The input 
represents 1/10 of the material used in the binding assay. 
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Figure 11A is a graph which compares apoptosis levels in cells transfected with GFP- 

APSK1, GFP-THAP2 or GFP-THAP3 expression vectors. Apoptosis was quantified by DAPI 

staining of apoptotic nuclei, 24 h after serum-withdrawal Values are the means of two independent 

representative experiments. 

Figure 1 IB is a graph which compares apoptosis levels in cells transfected with GFP- 
APSK1, GFP-THAP2 or GFP-THAP3 expression vectors. Apoptosis was quantified by DAPI 
staining of apoptotic nuclei, 24 h after additional of TNFoc. Values are the means of two 
independent representative experiments. 

Figure 12 illustrates the results obtained by screening several different THAP1 mutants in a 
yeast two-hybrid system with SLC/CCL21 bait. The primary structure of each THAP1 deletion 
mutant that was tested is shown. The 70 carboxy-terminal residues of THAP1 (amino acids 143- 
2 1 3) are sufficient for binding to chemokine SLC/CCL2 1 . 

Figure 13 illustrates the interaction of THAP1 with wild type SLC/CCL21 and a 
SLC/CCL21 mutant deleted of the basic carboxy-terminal extension (SLC/CCL2 1 ACOOH). The 
interaction was analyzed both in yeast two-hybrid system with THAP1 bait and in vitro using GST- 
pull down assays with GST-THAP1. 

Figure 14 depicts micrographs of the primary human endothelial cells were transfected with 
the GFP-THAPO, 1, 2, 3, 6 ,7 ,8 , 10, 11 (green fluorescence) expression constructs. To reveal the 
nuclear localization of the human THAP proteins, nuclei were counterstained with DAPI (blue). 
The bar equals 5 fim. 

Figure 15A is a threading-derived structural alignment between the THAP domain of 
human THAP1 (THAP1) (amino acids 1-81 of SEQ ID NO: 3) and the thyroid receptor P DNA 
binding domain (NLLB) (SEQ ID NO: 121). The color coding is identical to that described in 
Figure 15D. 

Figure 15B shows a model of the three-dimensional structure of the THAP domain of 
human THAP1 based on its homology with the crystallographic structure of thyroid receptor 
p. The color coding is identical to that described in Figure 15D. 

Figure 15C shows a model of the three-dimensional structure of the DNA-binding domain 
of Drosophila transposase (DmTRP) based on its homology with the crystallographic structure of 
the DNA-binding domain of the glucocorticoid receptor. The color coding is identical to that 
described in Figure 15D. 

Figure 15D is a threading-derived structural alignment between the Drosophila 
melanogaster transposase DNA binding domain (DmTRP) (SEQ ID NO: 120) and the 
glucocorticoid receptor DNA binding domain (GLUA) (SEQ ID NO: 122). In accordance with the 
sequences and structures in Figures 15A - 15C, the color-coding is the following: brown indicates 
residues in ot-helices; indigo indicates residues in P-strands; red denotes the eight conserved Cys 
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22ffifii 9 &LB and GLUA or for the three Cys residues common to THAP ^^d DmTRP- 
magenta indicates other Cys residues in THAPl or DmTRP; cyan denotes the residues involved in 
the hydrophobic interactions networks colored in THAPl or DmTRP. 

Figure 16A illustrates the results obtained by screening several different THAPl mutants in 
a yeast two-hybrid system with THAPl bait. The primary structure of each THAPl deletion mutant 
that was tested is shown. A (+) indicates binding whereas a (-) indicates no binding. 

Figure 16B shows the binding of several in vitro translated, 35 S-methionine-Iabeled THAPl 
deletion mutants to a GST-THAP1 polypeptide fusion. Wild-type THAPl was expressed as a GST 
fusion protein then purified on an affinity matrix of glutathione sepharose. GST served as negative 
control. The input represents 1/10 of the material used in the binding assay. 

Figure 17A is an agarose gel showing two distinct THAPl cDNA fragments were obtained 
by RT-PCR. Two distinct THAPl cDNAs were -400 and 600 nucleotides in length. 

Figure 17B shows that the 400 nucleotide fragment corresponds to an alternatively spliced 
isoform of human THAPl cDNA lacking exon 2 (nucleotides 273-468 of SEQ ID 160). 

Figure 17C is a Western blot which shows that the second isoform of human THAPl 
(THAPlb) encodes a truncated THAPl protein (THAPl C3) lacking the amino-terminal THAP 
domain. 

Figure ISA shows a specific DNA binding site recognized by the THAP domain of human 
THAPl. The THAP domain recognizes GGGCAA or TGGCAA DNA target sequences 
preferentially organized as direct repeats with 5 nucleotide spacing (DR-5). The consensus 
sequence 5'- GGGCAAnnnnnTGGCAA -3' (SEQ ID NO: 149). The DR-5 consensus was 
generated by examination of 9 nucleic acids bound by THAPl (SEQ ID NO: 140-148, beginning 
sequentially from the top). 

Figure 18B shows a second specific DNA binding site recognized by the THAP domain of 
human THAPl. The THAP domain recognizes everted repeats with 1 1 nucleotide spacing (ER-1 1) 
havmg a consensus sequence 5'- TTGCCAnnnnnnnnnnnGGGCAA -3' (SEQ ID NO: 159) The 
ER-11 consensus was generated by examination of 9 nucleic acids bound by THAPl (SEQ ID NO: 
1 50- 1 58, beginning sequentially from the top). 

DETAILED DESCRIPT ION OF TTTF INVENTION 

THAP and PAR4 biological pathways 

As mentioned above, the inventors have discovered a novel class of proteins involved in 
apoptosis. Then, the inventors have also linked a member of this novel class to another (PAR4) 
apoptosis pathway, and further linked both of these pathways to PML-NBs. Moreover the 
inventors have also linked both of these pathways to endothelial cells, providing a range of novel 
and potentially selective therapeutic treatments. In particular, it has been discovered that THAPl 
(THanatos (death)-As S ociated-Protein-l) localizes to PML-NBs. Furthermore, two hybrid 
screening of an HEVEC cDNA library with the THAPl bait lead to the identification of a unique 
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interacting partner, the pro-apoptotic protein PAR4. PAR4 is also found to accumulate into PML- 

NBs. Targeting of the THAP-1 / PAR4 complex to PML-NBs is mediated by PML. Similarly to 

PAR4, THAP1 has a pro-apoptotic activity. This activity includes a novel motif in the amino- 

terminal part called THAP domain. Together these results define a novel PML-NBs pathway for 

apoptosis that involves the THAP1/PAR4 pro-apoptotic complex. 

THAP -family members, and uses thereof 

The present invention includes polynucleotides encoding a family of pro-apoptotic 
polypeptides THAP-0 to THAP11, and uses thereof for the modulation of apoptosis-related and 
other THAP-mediated activities. Included is THAP1, which forms a complex with the pro- 
apoptotic protein PAR4 and localizes in discrete subnuclear domains known as PML nuclear 
bodies. Additionally, THAP-family polypeptides can be used alter or otherwise modulate 
bioavailability of SLC/CCL21 (SLC). 

The present invention also includes a novel protein motif, the THAP domain, which is 
found in an 89 amino acid domain in the amino-terminal part of THAP1 and which is involved in 
THAP1 pro-apoptotic activity. The THAP domain defines a novel family of proteins, the THAP- 
family, with at least twelve distinct members in the human genome (THAP-0 to THAP 11), which 
all contain a THAP domain in their amino-terminal part. The present invention thus pertains to 
nucleic acid molecules, including genomic and in particular the complete cDNA sequences, 
encoding members of the THAP-family, as well as with the corresponding translation products, 
nucleic acids encoding THAP domains, homologues thereof, nucleic acids encoding at least 10, 12, 
15, 20, 25, 30, 40, 50, 100,150 or 200 consecutive amino acids, to the extent that said span is 
consistent with the particular SEQ ID NO, of a sequence selected from the group consisting of SEQ 
IDNOs: 160-175. 

THAP1 has been identified based on its expression in HEVs, specialized postcapillary 
venules found in lymphoid tissues and nonlymphoid tissues during chronic inflammatory diseases 
that support a high level of lymphocyte extravasation from the blood. An important element in the 
cloning of the THAP1 cDNA from HEVECs was the development of protocols for obtaining 
HEVECs RNA, since HEVECs are not capable of maintaining their phenotype outside of their 
native environment for more than a few hours. A protocol was developed where total RNA was 
obtained from HEVECs freshly purified from human tonsils. Highly purified HEVECs were 
obtained by a combination of mechanical and enzymatic procedures, immunomagnetic depletion 
and positive selection. Tonsils were minced finely with scissors on a steel screen, digested with 
collagenase/dispase enzyme mix and unwanted contaminating cells were then depleted using 
immunomagnetic depletion. HEVECs were then selected by immunomagnetic positive selection 
with magnetic beads conjugated to the HEV-specific antibody MECA-79. From these HEVEC that 
were 98% MECA-79-positive, 1 fig of total RNA was used to generate full length cDNAs for 
THAP1 cDNA cloning.and RT-PCR analysis. 
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As used herein, the term "nucleic acids" and "nucleic acid molecule" is Sed Tmdude 
DNA molecules (e.g., cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and analogs of 
the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be single 
stranded or double-stranded, but preferably is double-stranded DNA. Throughout the present 
specification, the expression "nucleotide sequence" may be employed to designate indifferently a 
polynucleotide or a nucleic acid. More precisely, the expression "nucleotide sequence" 
encompasses the nucleic material itself and is thus not restricted to the sequence information (i e 
the succession of letters chosen among the four base letters) that biochemically characterizes a 
s P ec lfic DNA or RNA molecule. Also, used interchangeably herein are terms "nucleic acids" 
'oligonucleotides", and "polynucleotides". 

An "isolated" nucleic acid molecule is one which is separated from other nucleic acid 
molecules which are present in the natural source of the nucleic acid. Preferably, an "isolated" 
nuclei acid is free of sequences which naturally flank the nucleic acid (i.e., sequences located at 
the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic 
acid is denved. For example, in various embodiments, the isolated THAP-family nucleic acid 
molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, I kb, 0.5 kb or O.l kb of nucleotide 
sequences which naturally flank the nucleic acid molecule in genomic DNA of the cell from which 
the nucleic acid is derived. Moreover, an "isolated" nucleic acid molecule, such as a cDNA 
molecule, can be substantially free of other cellular material, or culture medium when produced by 
recombmant techniques, or substantially free of chemical precursors or other chemicals when 
chemically synthesized. A nucleic acid molecule of the present invention, e.g., a nucleic acid 
molecule having the nucleotide sequence of SEQ ID NOs: 160-175, a portion thereof can be 
Elated using standard molecular biology techniques and the sequence information provided herein 
Using all or a portion of the nucleic acid sequence of SEQ ID NOs: 160-175, as a hybridization 
probe, THAP-family nucleic acid molecules can be isolated using standard hybridization and 
cloning techniques (e.g., as described in Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular 
Cloning. A Laboratory Manual. 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N. Yl, 1 989). 

Moreover, a nucleic acid molecule encompassing all or a portion of e.g. SEQ ID NOs- 160 
175, can be isolated by the polymerase chain reaction (PGR) using synthetic oligonucleotide 
primers designed based upon the sequence of SEQ ID NOs: 160-175. 

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively 
genomic DNA, as a template and appropriate oligonucleotide primers according to standard PGR 
amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector 
and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to 
THAP-family nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an 
automated DNA synthesizer. ° S 

-35- 



BNSOOCID: <WO 03051 91 7A2_I_> 



WO 03/051917 PCT/EP02/14027 

As used herein, the term "hybridizes to" is intended to describe conditions for moderate 

stringency or high stringency hybridization, preferably where the hybridization and washing 

conditions permit nucleotide sequences at least 60% homologous to each other to remain hybridized 

to each other. Preferably, the conditions are such that sequences at least about 70%, more preferably 

at least about 80%, even more preferably at least about 85%, 90%, 95% or 98% homologous to 

each other typically remain hybridized to each other. Stringent conditions are known to those 

skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, 

N.Y. (1989), 6.3.1-6.3.6. A preferred, non-limiting example of stringent hybridization conditions 

are as follows: the hybridization step is realized at 65°C in the presence of 6 x SSC buffer, 5 x 

Denhardt's solution, 0,5% SDS and lOOpg/ml of salmon sperm DNA. The hybridization step is 

followed by four washing steps: 

- two washings during 5 min, preferably at 65°C in a 2 x SSC and 0. 1%SDS buffer; 

- one washing during 30 min, preferably at 65°C in a 2 x SSC and 0. 1% SDS buffer, 

- one washing during 10 min, preferably at 65°C in a 0.1 x SSC and 0. 1%SDS buffer, 
these hybridization conditions being suitable for a nucleic acid molecule of about 20 

nucleotides in length. It will be appreciated that the hybridization conditions described above are to 
be adapted according to the length of the desired nucleic acid, following techniques well known, to 
the one skilled in the art, for example be adapted according to the teachings disclosed in Hames 
B.D. and Higgins S.J. (1985J Nucleic Acid Hybridization: A Practical Approach. Hames and 
Higgins Ed., IRL Press, Oxford; and Current Protocols in Molecular Biolog (supra). Preferably, an 
isolated nucleic acid molecule of the invention that hybridizes under stringent conditions to a 
sequence of SEQ ID NOs: 160-175 corresponds to a naturally-occurring nucleic acid molecule. As 
used herein, a "naturally-occurring" nucleic acid molecule refers to an RNA or DNA molecule 
having a nucleotide sequence that occurs in nature (e.g., encodes a natural protein). 

To determine the percent homology of two amino acid sequences or of two nucleic acids, 
the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the 
sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino 
or nucleic acid sequence and non-homologous sequences can be disregarded for comparison 
purposes). In a preferred embodiment, the length of a reference sequence aligned for comparison 
purposes is at least 30%, preferably at least 40%, more preferably at least 50%, even more 
preferably at least 60%, and even more preferably at least 70%, 80%, 90% or 95% of the length of 
the reference sequence (e.g., when aligning a second sequence to e.g. a THAP-1 amino acid 
sequence of SEQ ID NO: 3 having 213 amino acid residues, at least 50, preferably at least 100, 
more preferably at least 200, amino acid residues are aligned or when aligning a second sequence to 
the THAP-1 cDNA sequence of SEQ ID NO: 160 having 2173 nucleotides or nucleotides 202- 
835 which encode the amino acids of the THAP1 protein, preferably at least 100, preferably at least 
200, more preferably at least 300, even more preferably at least 400, and even more preferably at 
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least 500, 600, at least 700, at least 800, at least 900, at least 1000, at least 1200,™ 1400 at 
least 1600, at least 1800, or at least 2000 nucleotides are aligned. The amino acid residues' or 
nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When 
a position in the first sequence is occupied by the same amino acid residue or nucleotide as the 
corresponding position in the second sequence, then the molecules are homologous at that position 
(i.e., as used herein amino acid or nucleic acid "identity" is equivalent to amino acid or nucleic acid 
"homology"). The percent homology between the two sequences is a function of the number of 
idenucal positions shared by the sequences (i.e., % homology = number (#) of identical 
positions/total number (#) of positions 1 00). 

The comparison of sequences and determination of percent homology between two 
sequences can be accomplished using a mathematical algorithm. A preferred, non-limiting example 
of a mathemat,cal algorithm utilized for the comparison of sequences is the algorithm of Karlin and 
Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264-68, modified as in Karlin and Altschul (1993) 
Proc. Natl. Acad. Sci. USA 90:5873-77. Such an algorithm is incorporated into the NBLAST and 
XBLAST programs (version 2.0) of Altschul, et al. (1990) J. Mol. Biol. 215:403-10 BLAST 
nucleotide searches can be performed with the NBLAST program, score=100, wordlength=P to 
obtain nucleotide sequences homologous to THAP-family nucleic acid molecules of the invention 
BLAST protein searches can be performed with the XBLAST program, score=50, wordlength=3 to 
obtain amino acid sequences homologous to THAP-family protein molecules of the invention To 
obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in 
Altschul et al., (1997) Nucleic Acids Research 25(1 7):33 89-3402. When utilizing BLAST and 
Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and 
NBLAST) can be used (see, www.ncbi.nlm.nih.gov). Another preferred, non-limiting example of a 
mathemancal algorithim utilized for the comparison of sequences is the algorithm of Myers and 
Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version - 0) 
whrch is part of the GCG sequence alignment software package. When utilizing the ALIGN 
program for comparing amino acid sequences, a PAM120 weight residue table, a gap length penalty 
of 12, and a gap penalty of 4 can be used. 

The term "polypeptide" refers to a polymer of amino acids without regard to the length of 
the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 
polypeptide. This term also does not specify or exclude post-expression modifications of 
polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from 
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mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 

known in the art, both naturally occurring and non-naturally occurring. 

An "isolated" or "purified" protein or biologically active portion thereof is substantially free 
of cellular material or other contaminating proteins from the cell or tissue source from which the 
THAP family or THAP domain polypeptide, or a biologically active fragment or homologue thereof 
protein is derived, or substantially free from chemical precursors or other chemicals when 
chemically synthesized. The language "substantially free of cellular material" includes preparations 
of a protein according to the invention (e.g. THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof) in which the protein is separated from cellular 
components of the cells from which it is isolated or recombinantly produced. In one embodiment, 
the language "substantially free of cellular material" includes preparations of a protein according to 
the invention having less than about 30% (by dry weight) of protein other than the THAP-family 
protein (also referred to herein as a "contaminating protein"), more preferably less than about 20% 
of protein other than the protein according to the invention, still more preferably less than about 
10% of protein other than the protein according to the invention, and most preferably less than 
about 5% of protein other than the protein according to the invention. When the protein according 
to the invention or biologically active portion thereof is recombinantly produced, it is also 
preferably substantially free of culture medium, i.e., culture medium represents less than about 
20%, more preferably less than about 10%, and most preferably less than about 5% of the volume 
of the protein preparation. 

The language "substantially free of chemical precursors or other chemicals" includes 
preparations of THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof in which the protein is separated from chemical precursors or other chemicals 
which are involved in the synthesis of the protein. In one embodiment, the language "substantially 
free of chemical precursors or other chemicals" includes preparations of a THAP-family protein 
having less than about 30% (by dry weight) of chemical precursors or non-THAP-family chemicals, 
more preferably less than about 20% chemical precursors or non-THAP-family or THAP-domain 
chemicals, still more preferably less than about 10% chemical precursors or non-THAP-family or 
THAP-domain chemicals, and most preferably less than about 5% chemical precursors or nonr 
THAP-family or THAP-domain chemicals. 

The term "recombinant polypeptide" is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 

Accordingly, another aspect of the invention pertains to anti-THAP-family or THAP- 
domain antibodies. The term "antibody" as used herein refers to immunoglobulin molecules and 
immunologically active portions of immunoglobulin molecules, i.e., molecules that contain an 
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antigen bindntg site which spuciflcaUy binds (immanoreacts ^ an ^^T^AP 
&nn,y or THAP domain polypeptide, or a biologic* active fragment „ r homologue thereof 
Examples of tmmun„,ogica.ly active portjons of immunog|obulfa ^ 

F(ab >, fragments whieh can be generated by treating the antibody with an enzyme such as pU 
The mvenuon provides polyOcna. and monoclone, antibodies tha, bind a THAP family or THAP 
domain p„ ly pep tid e, or a b,„,ogica„y active flagmen, or homdogue .hereof. The ,enn -monoc.ona, 
- body or -monoclonal antibody composition-, as US ed herein, refers to a pop„,a,ion of and J 
molecules that eontatn on,y one species of an antigen btnding she capable of immunoreacting „ 

particular epttope of a THAP-family or THAP domain polypeptide. A monoclonal an body 
composition Btea typicaHy displays a singie binding affinity for a particular THAP-family or THA^ 
domain protein with which it immunoreacts. ■ 
PAR4 

,'d.a, f 7 T° ned abOVe ' aPOP '° SiS KSP ° me - 4 ffAR4) * * 38 *"* a™* *Ma"y 

tdea .fled « the prodnc, of a gene specificany „p r ega,a,ed in prostate tumor ce„s undergoing 

apoptosts for reviews see Rangnekar, ,998 : Mattson e« a,., ,999, The PAR4 nuCeic acid and 
annuo actd sequences, see Johnstone e, a,. Mot Celt Biol. ,6 (,2), 6945-0956 (.996). and 
Genbank accession no. U63809 (SEQ ID NO: 118). 

As used interchangeably hetein, a "PAR4 activity", -biological activity of a PAR4" or 
mactiona, activity of a PAR4", refers to a„ activriy exerted by a PAR4 protein, po,ypepride or 
nucteic actd mo,ecu,e as deemed m v,vo, or in vitro, accoremg to stendard JL Jes U Z 
em odunent, a PAR4 activity is a direc, activity, such as an association with a PAP.4-ter.er 

cTe r i or m ?r ferably ~ s iaduc "° n - m »- ° r ° f - or 

cycle. As used hereto, a "terge, molecule" is a molecule with which a PAR4 protein binds or 
interne* m nature, such tha, PAR4-media,.d function is achieved. An example of a PAR4 mr-e, 
molecule ,s a THAP-famuy protein such as THAP, or THAP2, or a PML-NBs protein A P^4 
tinge, molecule can be a PAR4 protein or pCypeptide or a „o»-PAR4 mo,ecu,e. For examp,e a 
PAR* tiu^t mo,ecu,e can be . non . PAR4 ^ „ ^ ^ • ■ 

unbrec activity such as an activity mediated by interaction of the PAR4 protein with a PAR4 target 
mo,ecu,e such tha, me tinge, m„,ecu,e modulates a downstream celhnar activity (e g tatentctioa of 
aPAK4 molecme uri th a PAP.4 terge, molecme can modutate me activity of 2£ ~« 
an intracellular signaling pathway). 

Binding or interaction with a PAR4 terge, moleoute (such as THAP1/PAR4 described 
herein) or with other targets can be detnteH f™ i • . 

fiuHd„.«, h „a- neteoted for example usmg a two hybrid-based assay in yeast to 

find dtugs tita, dtsrup, mtemction of the PAR4 bar, with die target (e.g. PAR 4) prey, or an in vhro 
mteraction assay with recombinant PAR4 and huge, proteins (e.g. THAP! and PAR4, 
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SLC/CCL21 (SLC) 

Biological Roles of SLC 

The signals which mediate T-cell infiltration during T-cell auto-immune diseases are poorly 
understood. SLC/CCL21 (SEQ ID NO: 119) is highly potent and highly specific for attracting T- 
cell migration. It was initially thought to be expressed only in secondary lymphoid organs, directing 
naive T-cells to areas of antigen presentation. However, using immunohistology it was found that 
expression of CCL21 was highly induced in endothelial cells of T-cell auto-immune infiltrative skin 
diseases (Christopherson et al. (2002) Blood electronic publication prior to printed publication). No 
other T-cell chemokine was consistently induced in these T cell skin diseases. The receptor for 
CCL21, CCR7, was also found to be highly expressed on the infiltrating T-cells, the majority of 
which expressed the memory CD45Ro phenotype. Inflamed venules endothelial cells expressing 
SLC/CCL21 in T cell infiltrative autoimmune skin diseases may therefore play a key role in the 
regulation of T-cell migration into these tissues. 

There are a number of other autoimmune diseases where induced expression of 
SLC/CCL21 in endothelial cells may cause abnormal recruitment of T-cells from the circulation to 
sites of pathologic inflammation. For instance, chemokine SLC/CCL21 appears to be important for 
aberrant T-cell infiltration in experimental autoimmune encephalomyelitis (EAE), an animal model 
for multiple sclerosis (Alt et al. (2002) Eur J Immunol 32:2133-44). Migration of autoaggressive T 
cells across the blood-brain barrier (BBB) is critically involved in the initiation of EAE. The direct 
involvement of chemokines in this process was suggested by the observation that G-protein- 
mediated signaling is required to promote adhesion strengthening of encephalitogenic T cells on 
BBB endothelium in vivo. A search for chemokines present at the BBB, by in situ hybridizations 
and immunohistochemistry revealed expression of the lymphoid chemokines CCL19/ELC and 
CCL21/SLC in venules surrounded by inflammatory cells (Alt et al. (2002) Eur J Immunol 
32:2133-44). Their expression was paralleled by the presence of their common receptor CCR7 in 
inflammatory cells in brain and spinal cord sections of mice afflicted with EAE. Encephalitogenic T 
cells showed surface expression of CCR7 and specifically chemotaxed towards both CCL19 or 
CCL21 in a concentration dependent and pertussis toxin-sensitive manner comparable to naive 
lymphocytes in vitro. Binding assays on frozen sections of EAE brains demonstrated a functional 
involvement of CCL19 and CCL21 in adhesion strengthening of encephalitogenic T lymphocytes to 
inflamed venules in the brain (Alt et al. (2002) Eur J Immunol 32:2133^44). Taken together these 
data suggested that the lymphoid chemokines CCL19 and CCL21 besides regulating lymphocyte 
homing to secondary lymphoid tissue are involved in T lymphocyte migration into the 
immunoprivileged central nervous system during immunosurveillance and chronic inflammation. 

Other diseases where induced expression of SLC/CCL21 in venular endothelial cells has 
been observed include rheumatoid arthritis (Page et al. (2002) J Immunol 168:5333-5341) and 
experimental autoimmune diabetes (Hjelmstrom et al. (2000) Am J Path 156:1133-1138). 
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mora* cnemokine SLC/CCL21 may be an important pharmacological Jg^T^ll^uto 
immune diseases. Inhibitors of SLC/CCL21 may be effective agents at treating these T cell 
infiltrative diseases by interfering with the abnormal recruitment of T cells, from the circulation to 
sites of pathologic inflammation, by endothelial cells expressing SLC/CCL21. The reduction in T 
cell migration into involved tissue would reduce the T-cell inflicted damage seen in those diseases 

Ectopic lymphoid tissue formation is a feature of many chronic inflammatory diseases 
including rheumatoid arhtritis, inflammatory bowel diseases (Crohn's disease, ulcerative colitis)' 
autoimmune diabetes, chronic inflammatory skin diseases (lichen panus, psoriasis )' 
Hashimoto's thyroiditis, Sjogren's syndrome, gastric lymphomas and chronic inflammatory liver 
d,sease (Girard and Springer (1995) Immunol today 16:449-457; Takemura et al. (2001) J Immunol 
167:1072-1080; Grant et al. (2002) Am J Pathol 2002 160:1445-55; Yoneyama et al. (2001) J ExD 
Med 193:35-49). V 

Ectopic expression of SLC/CCL21 has been shown to induce lymphoid neogenesis both in 
rmce and in human inflammatory diseases. In mice, transgenic expression of SLC/CCL?! in the 
pancreas (Fan et al. (2000) J Immunol 164:3955-3959; Chen et al. (2002) J Immunol 168 1001- 
1008; Luther et al. (2002) J Immunol 169:424-433), a non-lymphoid tissue, has been found to be 
sufficient for the development and organization of ectopic lymphoid tissue through differential 
recnutment of T and B lymphocytes and induction of high endothelial venules, specialized blood 
vessels for lymphocyte migration (Girard and Springer (1995) Immunol today 16 449-457) In 
humans, hepatic expression of SLC/CCL21 has been shown to promote the development of high 
endothelial venules and portal-associated lymphoid tissue in chronic inflammatory liver disease 
(Grant et al. (2002) Am J Pathol 2002 160:1445-55; Yoneyama et al. (2001) J Exp Med 19335- 
49).The chronic inflammatory liver disease primary sclerosing cholangitis (PSC) is associated with 
portal mflammation and the development of neolymphoid tissue in the liver. More than 70% of 
patients with PSC have a history of inflammatory bowel disease and strong induction of 
SLC/CCL21 on CD34(+) vascular endothelium in portal associated lymphoid tissue in PSC has 
been reported (Grant et al. (2002) Am J Pathol 2002 160:1445-55). In contrast, CCL21 is abs ea t 
from LYVE-1(+) lymphatic vessel endothelium. Intrahepatic lymphocytes in PSC include a 
population of CCR7( + ) T cells only half of which express CD45RA and which respond to CCL21 
m migration assays. The expression of CCL21 in association with mucosal addressin cell adhesion 
molecule-l m portal tracts in PSC may promote the recruitment and retention of CCR7(+) mucosal 
lymphocytes leading to the establishment of chronic portal inflammation and the expanded portal- 
associated lymphoid tissue. These findings are supported by studies in an animal model of chronic 
hepatic inflammation, that have shown that anti-SLC/CCL21 antibodies prevent the development of 
high endothelial venules and portal-associated lymphoid tissue (Yoneyama et al. (2001) J Exp Med 
193:35-49). V 
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Induction of chemokine SLC/CCL21 at a site of inflammation could convert the lesion 

from an acute to a chronic state with corresponding development of ectopic lymphoid tissue. 

Blocking chemokine SLC/CCL21 activity in chronic inflammatory diseases may therefore have 

significant therapeutic value. 

As used herein, "SLC/CCL21" and "SLC are synonymous. 

THAP-family members comprising a THAP Domain 

Based on the elucidation of a biological activity of the THAP1 protein in apoptosis as 
described herein, the inventors have identified and further characterized a novel protein motif, 
referred to herein as THAP domain. The THAP domain has been identified by the present 
inventors in several other polypeptides, as further described herein. Knowledge of the structure and 
function of the THAP domain allows the performing of screening assays that can be used in the 
preparation or screening of medicaments capable of modulating interaction with a THAP-family- 
target molecule, modulating cell cycle and cell proliferation, inducing apoptosis or enhancing or 
participating in the induction of apoptosis. 

As used interchangeably herein, a THAP-family protein or polypeptide, or a THAP-family 
member refers to any polypeptide having a THAP domain as described herein. As mentioned, the 
inventors have provided several specific THAP-family members. Thus, as referred to herein, a 
THAP-family protein or polypeptide, or a THAP-family member, includes but is not limited to a 
THAP-0, THAP1, THAP-2, THAP-3, THAP-4, THAP-5, THAP-6, THAP-7, THAP-8, THAP-9, 
THAP10 or a THAP 1 1 polypeptide. 

As used interchangeably herein, a "THAP-family activity", "biological activity of a THAP- 
family member" or "functional activity of a THAP-family member", refers to an activity exerted by 
a THAP family or THAP domain polypeptide or nucleic acid molecule, or a biologically active 
fragment or homologue thereof comprising a THAP as determined in vivo, or in vitro, according to 
standard techniques. In one embodiment, a THAP-family activity is a direct activity, such as an 
association with a THAP-family-target molecule or most preferably apoptosis induction activity- or 
inhibition of cell proliferation or cell cycle. As used herein, a " THAP-family target molecule" is a 
molecule with which a THAP-family protein binds or interacts in nature, such that a THAP family- 
mediated function is achieved. For example, a THAP family target molecule can be another THAP- 
family protein or polypeptide which is substantially identical or which shares structural similarity 
(e.g. forming a dimer or multimer). In another example, a THAP family target molecule can be a 
non-THAP family comprising protein molecule, or a non-self molecule such as for example a Death 
Domain receptor. Binding or interaction with a THAP family target molecule (such as 
THAP1/PAR4 described herein) or with other targets can be detected for example using a two 
hybrid-based assay in yeast to find drugs that disrupt interaction of the THAP family bait with the 
target (e.g. PAR4) prey, or an in vitro" interaction assay with recombinant THAP family and target 
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proteins (e.g. THAP1 and PAR4). In yet another example, a TRAP family target molecule can be a 
nucleic acid molecule. For instance, a THAP family target molecule can be DNA. 

Alternatively, a THAP-family activity may be an indirect activity, such as an activity 
mediated by interaction of the THAP-family protein with a THAP-family target molecule such that 
the target molecule modulates a downstream cellular activity (e.g., interaction of a THAP-family 
molecule with a THAP-family target molecule can modulate the activity of that target molecule on 
an intracellular signaling pathway). 

THAP-family activity is not limited to the induction of apoptotic activity, but may also 
involve enhancing apoptotic activity. As death domains may mediate protein-protein interactions 
including interactions with other death domains, THAP-family activity may involve transducing a 
cytocidal signal. 

Assays to detect apoptosis are well known. In a preferred example, an assay is based on 
serum-withdrawal induced apoptosis in a 3T3 cell line with tetracycline-regulated expression of a 
THAP family member comprising a THAP domain. Other non-limiting examples are also 
described. 

In one example, a THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof can be the minimum region of a polypeptide that is necessary and 
sufficient for the generation of cytotoxic death signals. Exemplary assays for apoptosis activity are 
further provided herein. 

In specific embodiments, PAR4 is a preferred THAP1 and/or THAP2 target molecule. In 
another aspect, a THAP1 target molecule is a PML-NB protein. 

In further aspects, THAP-domain or a THAP-family polypeptide comprises a DNA binding 

domain. 

In other aspects, a THAP-family activity is detected by assessing any of the following 
activities: (1) mediating apoptosis or cell proliferation when expressed in or introduced into a cell, 
most preferably inducing or enhancing apoptosis, and/or most preferably reducing cell proliferation- 
(2) mediating apoptosis or cell proliferation of an endothelial cell; (3) mediating apoptosis or cell 
proliferation of a hyperproliferative cell; (4) mediating apoptosis or cell proliferation of a CNS cell 
preferably a neuronal or glial cell; (5) an activity determined in an animal selected from the group 
consisting of mediating, preferably inhibiting angiogenesis, mediating, preferably inhibiting 
inflammation, inhibition of metastatic potential of cancerous tissue, reduction of tumor burden 
increase in sensitivity to chemotherapy or radiotherapy, killing a cancer cell, inhibition of the 
growth of a cancer cell, or induction of tumor regression; or (6) interaction with a THAP family 
target molecule or THAP domain target molecule, preferably interaction with a protein or a nucleic 
acid. Detecting THAP-family activity may also comprise detecting any suitable therapeutic 
endpoint discussed herein in the section titled "Methods of Treatment". THAP-family activity may 
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be assessed either in vitro (cell or non-cell based) or in vivo depending on the assay type and 

format. 

A THAP domain has been identified in the N-terminal region of the THAP1 protein, from 
about amino acid 1 to about amino acid 89 of SEQ ID NO: 3 based on sequence analysis and 
functional assays. A THAP domain has also been identified in THAP2 to THAPO of SEQ ID NOs: 
4-14. However, it will be appreciated that a functional THAP domain may be only a small portion 
of the protein, about 10 amino acids to about 15 amino acids, or from about 20 amino acids to about 
25 amino acids, or from about 30 amino acids to about 35 amino acids, or from about 40 amino 
acids to about 45 amino acids, or from about 50 amino acids to about 55 amino acids, or from about 
60 amino acids to about 70 amino acids, or from about 80 amino acids to about 90 amino acids, or 
about 100 amino acids in length. Alternatively, THAP domain or THAP family polypeptide 
activity, as defined above, may require a larger portion of the native protein than may be defined by 
protein-protein interaction, DNA binding, cell assays or by sequence alignment. A portion of a 
THAP domain-containing polypeptide from about 110 amino acids to about 115 amino acids, or 
from about 120 amino acids to 130 amino acids, or from about 140 amino acids to about 150 amino 
acids, or from about 160 amino acids to about 170 amino acids, or from about 180 amino acids to 
about 190 amino acids, or from about 200 amino acids to about 250 amino acids, or from about 300 
amino acids to about 350 amino acids, or from about 400 amino acids to about 450 amino acids, or 
from about 500 amino acids to about 600 amino acids, to the extent that said length is consistent 
with the SEQ ID No, or the full length protein, for example any full length protein in SEQ ID NOs: 
1-1 14, may be required for function. 

As discussed, the invention includes a novel protein domain, including several examples of 
THAP-family members. The invention thus encompasses a THAP-family member comprising a 
polypeptide having at least a THAP domain sequence in the protein or corresponding nucleic acid 
molecule, preferably a THAP domain sequence corresponding to SEQ ID NOs: 1-2. A THAP- 
family member may comprise an amino acid sequence of at least about 25, 30, 35, 40, 45, 50, 60, 
70, 80 to 90 amino acid residues in length, of which at least about 50-80%, preferably at least about 
60-70%, more preferably at least about 65%, 75% or 90% of the amino acid residues are identical 
or similar amino acids-to the THAP consensus domain SEQ ID NOs: 1-2. 

Identity or similarity may be determined using any desired algorithm, including the 
algorithms and parameters for determining homology which are described herein. 

Optionally, a THAP-domain-containing THAP-family polypeptide comprises a nuclear 
localization sequence (NLS). As used herein, the term nuclear localization sequence refers to an 
amino sequence allowing the THAP-family polypeptide to be localized or transported to the cell 
nucleus. A nuclear localization sequence generally comprises at least about 10, preferably about 13, 
preferably about 16, more preferably about 19, and even more preferably about 21, 23, 25, 30, 35 or 
40 amino acid residues. Alternatively, a THAP-family polypeptide may comprise a deletion of part 
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OTfte 3 entJeNLS or a substitution or insertion in a NLS sequence, such that 
family polypeptide is not localized or transported to the cell nucleus. 

Isolated proteins of the present invention, preferably THAP family or THAP domain 
polypeptides, or a biologically active fragments or homologues thereof, have an amino acid 
sequence sufficiently homologous to the consensus amino acid sequence of SEQ ID NOs- 1 2 As 
used herein, the term "sufficiently homologous" refers to a first amino acid or nucleotide sequence 
which contains a sufficient or minimum number of identical or equivalent (e.g., an amino acid 
residue which has a smiilar side chain) amino acid residues or nucleotides to a second amino acid or 
nucleotide sequence such that the first and second amino acid or nucleotide sequences share 
common structural domains or motifs and/or a common functional activity. For example amino 
acid or nucleotide sequences which share common structural domains have at least about 30-40% 
identity, preferably at least about 40-50o/ o identity, more preferably at least about 50-60% and even 
more preferably at least about 60-70%, 70-80%, 80%, 90%, 95%, 97%, 98%, 99% or 99 8% 
identity across the amino acid sequences of the domains and contain at least one and preferably two 
structural domains or motifs, are defined herein as sufficiently homologous. Furthermore amino 
acid or nucleotide sequences which share at least about 30%, preferably at least about 40% more 
preferably at least about 60%, 70%, 80%, 90%, 95% 97%, 98%, 99% or 99.8% identity and share a 
common functional activity are defined herein as sufficiently homologous. 

It be appreciated that the invention encompasses any of the THAP-family polypeptides as 
well as fragment thereof, nucleic acids complementary thereto and nucleic acids capable' of 
hybridizing thereto under stringent conditions. 
THAP-0 to THAPl 1 

As mentioned, the inventors have identified several THAP-family members, including 
THAP-0, THAPl, THAP-2, THAP-3, THAP-4, THAP-5, THAP-6, THAP-7, THAP-8 THAP^ 
THAPIO and THAPl 1. ' ' 

THAPl Nucleic Acids 

The human THAPl coding sequence, which is approximately 639 nucleotides in length 
shown in SEQ ID NO 160, encodes a protein which is approximately 213 amino acid residues in 
length. One aspect of the invention pertains to purified or isolated nucleic acid molecules that 
encode THAPl proteins or biologically active portions thereof as further described herein as well 
as nucleic acid fragments thereof. Said nucleic acids may be used for example in therapeutic 
methods and drug screening assays as further described herein. 

The human THAPl gene is localized at chromosomes 8, 1 8, 1 1. 

The THAPl protein comprises a THAP domain at amino acids 1-89, the role of which in 
apoptosis is further demonstrated herein. The THAPl protein comprises an interferon gamma 
homology motif at amino acids 136-169 of human THAPl 
(NYTVEDTMHQRKRIHQLEQQVEKLRKKLKTAQQR) (SEQ ID NO: 178), exhibiting 41% 
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identity in a 34 residue overlap with human interferon gamma (amino acids 98-131). PML-NBs are 
closely linked to IFNgamma, and many PML-NB components are induced by IFNgamma, with 
IFNgamma responsive elements in the promoters of the corresponding genes. The THAPl protein 
also includes a nuclear localization sequence at amino acids 146-165 of human THAPl 
(RKRIHQLEQQVEKLRKKLKT) (SEQ ID NO: 179). This sequence is responsible for localization 
of THAPl in the nucleus. As demonstrated in the examples provided herein, deletion mutants of 
THAPl lacking this sequence are no longer localized in the cell nucleus. The THAPl protein 
further comprises a PAR4 binding motif (LE(X) H QRXRRQXR(X) } i QR/KE) (SEQ ED NO: 180). 
The core of this motif has been defined experimentally by site directed mutagenesis and by 
comparison with mouse ZIP/DAP-Iike kinase (another PAR4 binding partner) it overlaps amino 
acids 168-175 of human THAPl but the motif may also include a few residues upstream and 
downstream. 

ESTs corresponding to THAPl have been identified, and may be specifically included or 
excluded from the nucleic acids of the invention. The ESTs, as indicated below by accession 
number, provide evidence for tissue distribution for THAPl as follows : AL582975 (B cells from 
Burkitt lymphoma); BG708372 (Hypothalamus); BG563619 (liver); BG497522 (adenocarcinoma); 
BG6 16699 (liver); BE932253 (head__neck); AL530396 (neuroblastoma cells). 

An object of the invention is a purified, isolated, or recombinant nucleic acid comprising 
the nucleotide sequence of SEQ ID NO: 160, complementary sequences thereto, and fragments 
thereof. The invention also pertains to a purified or isolated nucleic acid comprising a 
polynucleotide having at least 95% nucleotide identity with a polynucleotide of SEQ ID NO: 160, 
advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity and most preferably 
99.8% nucleotide identity with a polynucleotide of SEQ ID NO: 1§0, or a sequence complementary 
thereto or a biologically active fragment thereof. Another object of the invention relates to purified, 
isolated or recombinant nucleic acids comprising a polynucleotide that hybridizes, under the 
stringent hybridization conditions defined herein, with a polynucleotide of SEQ ID NO: 160, or a 
sequence complementary thereto or a variant thereof or a biologically active fragment thereof. In 
further embodiments, nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID NO: 160,.or the complements thereof. 

Also encompassed is a purified, isolated, or recombinant nucleic acid polynucleotide 
encoding a THAPl polypeptide of the invention, as further described herein. 

In another preferred aspect, the invention pertains to purified or isolated nucleic acid 
molecules that encode a portion or variant of a THAPl protein, wherein the portion or variant 
displays a THAPl activity of the invention. Preferably said portion or variant is a portion or variant 
of a naturally occurring full-length THAPl protein. In one example, the invention provides a 
polynucleotide comprising, consisting essentially of, or consisting of a contiguous span of at least 
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12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID 
NO: 160, wherein said nucleic acid encodes" a THAPl portion or variant having a THAPl activity 
described herein. In other embodiments, the invention relates to a polynucleotide encoding a 
THAPl portion consisting of 8-20, 20-50, 50-70, 60-100, 100 - 150, 150- 200, 200-205 or 205-212 
amino acids of SEQ ID NO: 3, or a variant thereof, wherein said THAPl portion displays a THApI 
activity described herein. 

The sequence of SEQ ID NO: 160 corresponds to the human THAPl cDNA. This cDNA 
comprises sequences encoding the human THAPl protein (i.e., "the coding region" from 
nucleotides 202 to 840, as well as 5' untranslated sequences (nucleotides 1-201) and 3' untranslated 
sequences (nucleotides 841 to 2173). 

Also encompassed by the THAPl nucleic acids of the invention are nucleic acid molecules 
whtch are complementary to THAPl nucleic acids described herein. Preferably, a complementary 
nucleic acid is sufficiently complementary to the nucleotide sequence shown in SEQ ID NO: 160, 
such that it can hybridize to the nucleotide sequence shown in SEQ ID NO: 160, thereby forming a 
stable duplex. 

Another object of the invention is a purified, isolated, or recombinant nucleic acid encoding 
a THAPl polypeptide comprising, consisting essentially of, or consisting of the amino acid 
sequence of SEQ ID NO: 3, or fragments thereof, wherein the isolated nucleic acid molecule 
encodes one or more motifs selected from the group consisting of a THAP domain, a THAPl target 
binding region, a nuclear localization signal and a interferon gamma homology motif. Preferably 
said THAPl target binding region is a PAR4 binding region or a DNA binding region. For 
example, the purified, isolated or recombinant nucleic acid may comprise a genomic DNA or 
fragment thereof which encodes the polypeptide of SEQ ID NO: 3 or a fragment thereof or a 
cDNA consisting of, consisting essentially of, or comprising the sequence of SEQ ID NO: 160 or 
fragments thereof, wherein the isolated nucleic acid molecule encodes one or more motifs selected 
from the group consisting of a THAP domain, a THAPl-target binding region, a nuclear 
localization signal and a interferon gamma homology motif. Any combination of said motifs may 
also be specified. Preferably said THAPl target binding region is a PAR4 binding region or a DNA 
binding region. Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant THAPl nucleic acids comprising, consisting essentially of, or consisting of a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or 300 
nucleotides of a sequence selected from the group consisting of nucleotide positions ranges 
consisting of 607 to 708, 637 to 696 and 703 to 747 of SEQ ID NO: 160. In preferred 
embodiments, a THAPl nucleic acid encodes a THAPl polypeptide comprising at least two 
THAPl functional domains, such as for example a THAP domain and a PAR4 binding region. 

In further preferred embodiments, a THAPl nucleic acid comprises a nucleotide sequence 
encoding a THAP domain having the consensus amino acid sequence of the formula of SEQ ID 
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NOs: 1-2. A THAP1 nucleic acid may also encode a THAP domain wherein at least about 95%, 

90%, 85%, 50-80%, preferably at least about 60-70%, more preferably at least about 65% of the 

amino acid residues are identical or similar amino acids-to the THAP domain consensus sequence 

(SEQ ID NOs: 1-2). The present invention also embodies isolated, purified, and recombinant 

polynucleotides which encode a polypeptide comprising a contiguous span of at least 6 amino acids, 

preferably at least 8 or 10 amino acids, more preferably at least 15, 25, 30, 35, 40, 45, 50, 60, 70, 80 

or 90 amino acids according to the formula of SEQ ID NO: 1-2. 

The nucleotide sequence determined from the cloning of the THAP1 gene allows for the 

generation of probes and primers designed for use in identifying and/or cloning other THAP1 

family members (e.g. sharing the novel functional domains), as well as THAP1 homologues from 

other species. 

A nucleic acid fragment encoding a "biologically active portion of a THAP1 protein" can 
be prepared by isolating a portion of the nucleotide sequence of SEQ ED NO: 160, which encodes a 
polypeptide having a THAP1 biological activity (the biological activities of the THAP1 proteins 
described herein), expressing the encoded portion of the THAP1 protein (e.g., by recombinant 
expression in vitro or in vivo) and assessing the activity of the encoded portion of the THAP1 
protein. 

The invention further encompasses nucleic acid molecules that differ from the THAP1 
nucleotide sequences of the invention due to degeneracy of the genetic code and encode the same 
THAP1 proteins and fragment of the invention. 

In addition to the THAP1 nucleotide sequences described above, it will be appreciated by 
those skilled in the art that DNA sequence polymorphisms that lead to changes in the amino acid 
sequences of the THAP1 proteins may exist within a population (e.g., the human population). Such 
genetic polymorphism may exist among individuals within a population due to natural allelic 
variation. Such natural allelic variations can typically result in 1-5% variance in the nucleotide 
sequence of a THAP1 gene. 

Nucleic acid molecules corresponding to natural allelic variants and homologues of the 
THAP1 nucleic acids of the invention can be isolated based on their homology to the THAP1 
nucleic acids disclosed herein using the cDNAs disclosed herein, or a portion thereof, as a 
hybridization probe according to standard hybridization techniques under stringent hybridization 
conditions. 

Probes based on the THAP1 nucleotide sequences can be used to detect transcripts or 
genomic sequences encoding the same or homologous proteins. In preferred embodiments, the 
probe further comprises a label group attached thereto, e.g., the label group can be a radioisotope, a 
fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be used as a part of a 
diagnostic test kit for identifying cells or tissue which misexpress a THAP1 protein, such as by 
measuring a level of a THAP 1 -encoding nucleic acid in a sample of cells from a subject e.g., 

-48- 



BNSDOCID: <WO 0305191 7A2J_> 



WO 03/051917 PCT/EP02/14027 
detecting 1HAP1 mRNA levels or determining whether a genomic THAP1 gene has been mutated 
or deleted. 

THAP1 Polypeptides 

The term "THAP1 polypeptides" is used herein to embrace all of the proteins and 
polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies THAP1 proteins from humans, including isolated or purified 
THAP1 proteins consisting of, consisting essentially of, or comprising the sequence of 
SEQ ID NO: 3. 

The invention concerns the polypeptide encoded by a nucleotide sequence of SEQ ED NO: 
160, a complementary sequence thereof or a fragment thereto. 

The present invention embodies isolated, purified, and recombinant polypeptides 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, 
more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID NO: 3. In other 
preferred embodiments the contiguous stretch of amino acids comprises the site of a mutation or 
functional mutation, including a deletion, addition, swap or truncation of the amino acids in the 
THAP1 protein sequence. The invention also concerns the polypeptide encoded by the THAP1 
nucleotide sequences of the invention, or a complementary sequence thereof or a fragment thereof. 

One aspect of the invention pertains to isolated THAP1 proteins, and biologically active 
portions thereof, as well as polypeptide fragments suitable for use as immunogens to raise anti- 
THAP1 antibodies. In one embodiment, native THAP1 proteins can be isolated from cells or tissue 
sources by an appropriate purification scheme using standard protein purification techniques. In 
another embodiment, THAP1 proteins are produced by recombinant DNA techniques. Alternative 
to recombinant expression, a THAP1 protein or polypeptide can be synthesized chemically using 
standard peptide synthesis techniques. 

Typically, biologically active portions comprise a domain or motif with at least one activity 
of the THAP1 protein. The present invention also embodies isolated, purified, and recombinant 
portions or fragments of one THAP1 polypeptide comprising a contiguous span of at least 6 amino 
acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100 
or 200 amino acids of SEQ ID NO: 3. Also encompassed are THAP1 polypeptide which comprise 
between 10 and 20, between 20 and 50, between 30 and 60, between 50 and 100, or between 100 
and 200 amino acids of SEQ ID NO: 3. In other preferred embodiments the contiguous stretch of 
amino acids comprises the site of a mutation or functional mutation, including a deletion, addition, 
swap or truncation of the amino acids in the THAP1 protein sequence. 

A biologically active THAP1 protein may, for example, comprise at least 1, 2, 3, 5, 10, 20 
or 30 amino acid changes from the sequence of SEQ ID NO: 3, or may encode a biologically active 
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THAP1 protein comprising at least 1%, 2%, 3%, 5%, 8%, 10% or 15% changes in amino acids 
from the sequence of SEQ ID NO: 3. 

In a preferred embodiment, the THAP1 protein comprises, consists essentially of, or 
consists of a THAP domain at amino acid positions 1 to 89 shown in SEQ ID NO: 3, or fragments 
or variants thereof. In other aspects, a THAP1 polypeptide comprises a THAP1 -target binding 
region, a nuclear localization signal and/or a Interferon Gamma Homology Motif. Preferably a 
THAP1 target binding region is a PAR4 binding region or a DNA binding region. The invention 
also concerns the polypeptide encoded by the THAP1 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80, 90 or 100 amino acids of an amino acid 
sequence selected from the group consisting of positions 1 to 90, 136 to 169, 146 to 165 and 168 to 
175 of SEQ ID NO: 3. In another aspect, a THAP1 polypeptide may encode a THAP domain 
wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 60-70%, more preferably 
at least about 65% of the amino acid residues are identical or similar amino acids-to the THAP 
domain consensus sequence (SEQ ID NOs: 1-2). Also encompassed by the present invention are 
isolated, purified, nucleic acids encoding a THAP1 polypeptide comprising, consisting essentially 
of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in SEQ ED NO: 3, or 
fragments or variants thereof. 

In other embodiments, the THAP1 protein is substantially homologous to the sequences of 
SEQ ID NO: 3, and retains the functional activity of the THAP1 protein, yet differs in amino acid 
sequence due to natural allelic variation or mutagenesis, as described further herein. Accordingly, in 
another embodiment, the THAP1 protein is a protein which comprises an amino acid sequence 
shares more than about 60% but less than 100% homology with the amino acid sequence of SEQ ID 
NO: 3 and retains the functional activity of the THAP1 proteins of SEQ ID NO: 3, respectively. 
Preferably, the protein is at least about 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 92%, 95%, 
97%, 98%, 99% or 99.8% homologous to SEQ ID NO: 3, but is not identical to SEQ ID NO: 3. 
Preferably the THAP1 is less than identical (e.g. 100% identity) to a naturally occurring THAP1. 
Percent homology can be determined as further detailed above. 
THAP-2 to THAP11 and THAP-0 Nucleic Acids 

As mentioned, the invention provides several members of the THAP-family. THAP-2, 
THAP-3, THAP-4, THAP-5, THAP-6, THAP-7, THAP-8, THAP-9, THAP10, THAP 11 and 
THAP-0 are described herein. The human and mouse nucleotide sequences corresponding to the 
human cDNA sequences are listed in SEQ ID NOs: 161-171; and the human amino acid sequences 
are listed respectively in SEQ ID NOs: 4-14. Also encompassed by the invention are orthologs of 
said THAP-family sequences, including mouse, rat, pig and other orthologs, the amino acid 
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sequences of which are listed in SEQ ID NOs: 16-1 14 and the cDNA sequences are listed in SEO 
ID NOs: 172-175. 
THAP-2 

The human THAP-2 cDNA, which is approximately 1302 nucleotides in length shown in 
SEQ ID NO: 161, encodes a protein which is approximately 228 amino acid residues in length 
shown in SEQ ID NO: 4. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-2 proteins or biologically active portions thereof as further described 
herem, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-2 
gene is localized at chromosomes 12 and 3. The THAP-2 protein comprises a THAP domain at 
ammo acids 1 to 89. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP-2 is expressed as follows: BG677995 (squamous cell carcinoma); AV718199 
(hypothalamus); BI600215 (hypothalamus); AI208780 (Soares_testis_NHT); BE566995 
(carcinoma cell line); AI660418 (thymus pooled) 
THAP-3 

The human THAP-3 cDNA which is approximately 1995 nucleotides in length shown in 
SEQ ID NO: 162. The THAP-3 gene encodes a protein which is approximately 239 amino acid 
rescues in length, shown in SEQ ID NO: 5. One aspect of the invention pertains to purified or 
isolated nucleic acid molecules that encode THAP-3 proteins or biologically active portions thereof 
as further described herein, as well as nucleic acid fragments thereof. Said nucleic acids may be 
used for example in therapeutic methods and drug screening assays as further described herein The 
human THAP-3 gene is localized at chromosome 1. The THAP-3 protein comprises a THAP 
domam at amino acids 1 to 89. Analysis of expressed sequences (accession numbers indicated 
winch may be specifically included or excluded from the nucleic acids of the invention) in 
databases suggests that THAP-3 is expressed as follows: BG700517 (hippocampus); BI460812 
(testis) ; BG707197 (hypothalamus); AW960428 (-); BG437177 (large cell carcinoma); BE962820 
(adenocarcinoma); BE548411 (cervical carcinoma cell line); AL522189 (neuroblastoma cells)- 
BE545497 (cervical carcinoma cell line); BE280538 (choriocarcinoma); BI086954 (cervix);' 
BE744363 (adenocarcinoma cell line); and BI549 1 5 1 (hippocampus). 
THAP-4 

The human THAP-4cDNA, shown as a sequence having 1999 nucleotides in length shown 
in SEQ ID NO: 163, encodes a protein which is approximately 577 amino acid residues in length 
shown in SEQ ID NO: 6. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-4 proteins or biologically active portions thereof as further described 
herem, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The THAP-4 protein 
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comprises a THAP domain at amino acids 1 to 90. Analysis of expressed sequences (accession 
numbers indicated, which may be specifically included or excluded from the nucleic acids of the 
invention) in databases suggests that THAP-4 is expressed as follows: AL544881 (placenta); 
BE384014 (melanotic melanoma); AL5 17205 (neuroblastoma cells); BG394703 (retinoblastoma); 
BG472327 (retinoblastoma); BI196071 (neuroblastoma); BE255202 (retinoblastoma); BI017349 
(lungjumor); BF972153 (leiomyosarcoma cell line); BG1 16061 (duodenal adenocarcinoma cell 
line); AL530558 (neuroblastoma cells); AL520036 (neuroblastoma cells); AL559902 (B cells from 
Burkitt lymphoma); AL534539 (Fetal brain); BF686560 (leiomyosarcoma cell line); BF345413 
(anaplastic oligodendroglioma with lp/19q loss); BG1 17228 (adenocarcinoma cell line); BG490646 
(large cell carcinoma); and BF769104 (epidtumor). 
THAP-5 

The human THAP-5 cDNA, shown as a sequence having 1034 nucleotides in length shown 
in SEQ ID NO: 164, encodes a protein which is approximately 239 amino acid residues in length, 
shown in SEQ ID NO: 7. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-5 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-5 
gene is localized at chromosome 7. The THAP-5 protein comprises a THAP domain at amino acids 
1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be specifically 
included or excluded from the nucleic acids of the invention) in databases suggests that THAP-5 is 
expressed as follows: BG575430 (mammary adenocarcinoma cell line); BI545812 (hippocampus); 
BI560073 (testis); BG530461 (embryonal carcinoma); BF244164 (glioblastoma); BI461364 (testis); 
AW407519 (germinal center B cells); BF 103690 (embryonal carcinoma); and BF939577 (kidney). 

THAP-6 

The human THAP-6cDNA, shown as a sequence having 2291 nucleotides in length shown 
in SEQ ID NO: 165, encodes a protein which is approximately 222 amino acid residues in length, 
shown in SEQ ID NO: 8. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-6 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-6 
gene is localized at chromosome 4. The THAP-6 protein comprises a THAP domain at amino acids 
1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be specifically 
included or excluded from the nucleic acids of the invention) in databases suggests that THAP-6 is 
expressed as follows: AV684783 (hepatocellular carcinoma); AV698391 (hepatocellular 
carcinoma) ; BI560555 (testis) ; AV688768 (hepatocellular carcinoma); AV692405 (hepatocellular 
carcinoma); and AV696360 (hepatocellular carcinoma). 
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1 MAP- 7 

The human THAP-7 cDNA, shown as a sequence having 1242 nucleotides in length shown 
in SEQ ID NO: 166, encodes a protein which is approximately 309 amino acid residues in length 
shown in SEQ ID NO: 9. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-7 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-7 
gene is localized at chromosome 22qll.2. The THAP-7 protein comprises a THAP domain at 
amino acids 1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP-7 is expressed as follows: BI193682 (epithelioid carcinoma cell line); BE253146 
(retinoblastoma); BE622113 (melanotic melanoma); BE740360 (adenocarcinoma cell line); 
BE513955 (Burkitt lymphoma); AL049117 (testis); BF952983 (nervous_normal); AW975614 (-);' 
BE273270 (renal cell adenocarcinoma); BE738428 (glioblastoma); BE388215 (endometrium 
adenocarcinoma cell line); BF762401 (colon_est); and BG329264 (retinoblastoma). 
THAP-8 

The human THAP-8 cDNA, shown as a sequence having 1383 nucleotides in length shown 
in SEQ ID NO: 167, encodes a protein which is approximately 274 amino acid residues in length 
shown in SEQ ID NO: 10. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-8 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-8 
gene is localized at chromosome 19. The THAP-8 protein comprises a THAP domain at amino 
acids 1 to 92. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP-8 is expressed as follows: BG703645 (hippocampus); BF026346 (melanotic melanoma); 
BE728495 (melanotic melanoma); BG334298 (melanotic melanoma); and BE390697 
(endometrium adenocarcinoma cell line). 
THAP-9 

The human THAP-9 cDNA, shown as a sequence having 693 nucleotides in length shown 
in SEQ ID NO: 168, encodes a protein which is approximately 231 amino acid residues in length 
shown in SEQ ID NO: 11. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-9 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The THAP-9 protein 
comprises a THAP domain at amino acids 1 to 92. Analysis of expressed sequences (accession 
numbers indicated, which may be specifically included or excluded from the nucleic acids of the 
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invention) in databases suggests that THAP-9 is expressed as follows: AA333595 (Embryo 8 

weeks). 

THAP10 

The human THAP10 cDNA, shown as a sequence having 771 nucleotides in length shown 
in SEQ ID NO: 169, encodes a protein which is approximately 257 amino acid residues in length, 
shown in SEQ ID NO: 12. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP10 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP10 
gene is localized at chromosome 15. The THAP10 protein comprises a THAP domain at amino 
acids 1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP 10 is expressed as follows: AL526710 (neuroblastoma cells); AV725499 (Hypothalamus) 
;AW966404 (-); AW296810 (lung); and AL557817 (T cells from T cell leukemia). 

THAP 11 

The human THAP1 1 cDNA, shown as a sequence having 942 nucleotides in length shown 
in SEQ ID NO: 170, encodes a protein which is approximately 314 amino acid residues in length, 
shown in SEQ ID NO: 13. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP1 1 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP 11 
gene is localized at chromosome 16. The THAP 11 protein comprises a THAP domain at amino 
acids 1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP11 is expressed as follows: AU 142300 (retinoblastoma); BI261822 (lymphoma cell line); 
BG423 102 (renal cell adenocarcinoma); and BG423864 (kidney). 

THAP-0 

The human THAP-0 cDNA, shown as a sequence having 2283 nucleotides in length shown 
in SEQ ID NO: 171, encodes a protein which is approximately 761 amino acid residues in length, 
shown in SEQ ID NO: 14. One aspect of the invention pertains to purified or isolated nucleic acid 
molecules that encode THAP-0 proteins or biologically active portions thereof as further described 
herein, as well as nucleic acid fragments thereof. Said nucleic acids may be used for example in 
therapeutic methods and drug screening assays as further described herein. The human THAP-0 
gene is localized at chromosome 11. The THAP-0 protein comprises a THAP domain at amino 
acids 1 to 90. Analysis of expressed sequences (accession numbers indicated, which may be 
specifically included or excluded from the nucleic acids of the invention) in databases suggests that 
THAP-0 is expressed as follows: BE713222 (head_neck); BE161184 (headjneck); AL1 19452 
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(amvgoaia,; AU129709 (teratocarcinoma); AW965460 (-); AW965460(-); ISs « ^ 
BE886885 (leiomyosarcoma). 

An object of the invention is a purified, isolated, or recombinant nucleic acid comprising 
the nucleotide sequence of SEQ ID NOs: 161-171, 173-175 or complementary sequences thereto 
and fragments thereof. The invention also pertains to a purified or isolated nucleic acid comprising 
a polynucleotide having at least 95% nucleotide identity with a polynucleotide of SEQ ID NOs" 
161-171 or 173-175, advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity 
and most preferably 99. 8 o/ 0 nucleotide identity with a polynucleotide of SEQ ID NOs- 161-171 
173-175 or a sequence complementary thereto or a biologically active fragment thereof Another 
object of the invention relates to purified, isolated or recombinant nucleic acids comprising a 
polynucleotide that hybridizes, under the stringent hybridization conditions defined herein with a 
polynucleotide of SEQ ID NOs: 161-171, 173-175 or a sequence complementary there'to or a 
vanant thereof or a biologically active fragment thereof. In further embodiments, nucleic acids of 
the mvention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of a sequence selected from the group consisting of SEQ ID NOs: 161-171, 173-175 or 
the complements thereof 

Also encompassed is a purified, isolated, or recombinant nucleic acid polynncleonde 
encoding a TSAP-2 to THAP11 or THAP-0 polypeptide of fire invention, as former described 
herein. 

In another preferred aspect, the invention pertains to purified or isolated nucleic acid 
molecules that encode a portion or variant of a THAP-2 to THAP1 1 or THAP-0 protein wherein 
the poruon or variant displays a THAP-2 to THAP11 or THAP-0 activity of the invention 
Preferably sa,d portion or variant is a portion or variant of a naturally occurring full-len*th THAP o 
to THAP11 or THAP-0 protein. In one example, the invention provides a polynucleotide 
compnsmg, consisting essentially of, or consisting of a contiguous span of at least 12 15 18 20 
25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides, to the extent'that" the' 
length of said span is consistent with the length of the SEQ ID NO, of a sequence selected from the 
group consisting of SEQ ID NOs: 161-171, 173-175, wherein said nucleic acid encodes a THAP-? 
to THAP11 or THAP-0 portion or variant having a THAP-2 to THAP11 or THAP-0 activity 
descnbed herein. In other embodiment, the invention relates to a polynucleotide encoding a THAP- 
2 to THAP1 1 or THAP-0 portion consisting of 8-20, 20-50, 50-70, 60-100, 100 - 150 150- 200 
200-250 or 250 - 350 amino acids, to the extent that the length of said portion is consistent with the 
length of the SEQ ID NO of a sequence selected from the group consisting of SEQ ID NOs - 4-14 
17-21, 23-40, 42-56, 58-98, 100-114 or a variant thereof, wherein said THAP-2 to THAP11 or 
THAP-0 portion displays a THAP-2 to THAP1 1 or THAP-0 activity described herein 
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A THAP-2 to THAP11 or THAP-0 variant nucleic acid may, for example, encode a 

biologically active THAP-2 to THAP11 or THAP-0 protein comprising at least 1, 2, 3, 5, 10, 20 or 

30 amino acid changes from the respective sequence selected from the group consisting of SEQ ID 

NO: 4-14, 17-21, 23-40, 42-56, 58-98 and 100-1 14 or may encode a biologically active THAP-2 to 

THAP11 or THAP-0 protein comprising at least 1%, 2%, 3%, 5%, 8%, 10% or 15% changes in 

amino acids from the respective sequence of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 and 

100-114. 

The sequences of SEQ ID NOs: 4-14 correspond to the human THAP-2 to THAP1 1 and 
THAP-0 DNAs respectively. SEQ ID NOs: 17-21, 23-40, 42-56, 58-98, 100-114 correspond to 
mouse, rat, pig and other orthologs. 

Also encompassed by the THAP-2 to THAP1 1 and THAP-0 nucleic acids of the invention 
are nucleic acid molecules which are complementary to THAP-2 to THAP11 or THAP-0 nucleic 
acids described herein. Preferably, a complementary nucleic acid is sufficiently complementary to 
the nucleotide respective sequence shown in SEQ ID NOs: 161-171 and 173-175 such that it can 
hybridize to said nucleotide sequence shown in SEQ ID NOs: 161-171 and 173-175, thereby 
forming a stable duplex. 

Another object of the invention is a purified, isolated, or recombinant nucleic acid encoding 
a THAP-2 to THAP11 or THAP-0 polypeptide comprising, consisting essentially of, or consisting 
of an amino acid sequence selected from the group consisting of SEQ ID NOs: 4-14, 17-21, 23-40, 
42-56, 58-98, 100-114 or fragments thereof, wherein the isolated nucleic acid molecule encodes a 
THAP domain or a THAP-2 to THAP11 or THAP-0 target binding region. Preferably said target 
binding region is a protein binding region, preferably a PAR-4 binding region, or preferably said 
target binding region is a DNA binding region. For example, the purified, isolated or recombinant 
nucleic acid may comprise a genomic DNA or fragment thereof which encodes a polypeptide 
having a sequence selected from the group consisting of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 
58-98, 100-114 or a fragment thereof. The purified, isolated or recombinant nucleic acid may 
alternatively comprise a cDNA consisting of, consisting essentially of, or comprising a sequence 
selected from the group consisting of SEQ ID NOs. 4-14, 17-21, 23-40, 42-56, 58-98, 100-114 or 
fragments thereof, wherein the isolated nucleic acid molecule encodes a THAP domain or a THAP- 
2 to THAP1 1 or THAP-0 target binding region. In preferred embodiments, a THAP-2 to THAP1 1 
or THAP-0 nucleic acid encodes a THAP-2 to THAP1 1 or THAP-0 polypeptide comprising at least 
two THAP-2 to THAP1 1 or THAP-0 functional domains, such as for example a THAP domain and 
a THAP-2 to THAP1 1 or THAP-0 target binding region. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant THAP-2 to THAP1 1 or THAP-0 nucleic acids comprising, consisting essentially of, or 
consisting of a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
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150, 200 or 250 nucleotides of a sequence selected from the group consisting of nucleotide 
positions coding for the relevant amino acids as given in the SEQ ID NO: 161-171 and 173-175. 

In further preferred embodiments, a THAP-2 to THAP11 or THAP-0 nucleic acid 
comprises a nucleotide sequence encoding a THAP domain having the consensus amino acid 
sequence of the formula of SEQ ID NOs: 1-2. A THAP-2 to THAP1 1 or THAP-0 nucleic acid may 
also encode a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least 
about 60-70%, more preferably at least about 65% of the amino acid residues are identical or 
similar amino acids-to the THAP consensus domain (SEQ ID NOs: 1-2). The present invention also 
embodies isolated, purified, and recombinant polynucleotides which encode a polypeptide 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, 
more preferably at least 15, 25, 30, 35, 40, 45, 50, 60, 70, 80 or 90 amino acids of 
SEQ ID NOs: 1-2. 

The nucleotide sequence determined from the cloning of the THAP-2 to THAP 11 or 
THAP-0 genes allows for the generation of probes and primers designed for use in identifying 
and/or cloning other THAP family members, particularly sequences related to THAP-2 to THAP1 1 
or THAP-0 (e.g. sharing the novel functional domains), as well as THAP-2 to THAP1 1 or THAP-0 
homologues from other species. 

A nucleic acid fragment encoding a biologically active portion of a THAP-2 to THAP 11 or 
THAP-0 protein can be prepared by isolating a portion of a nucleotide sequence selected from the 
group consisting of SEQ ID NOs: 161-171 and 173-175, which encodes a polypeptide having a 
THAP-2 to THAP 11 or THAP-0 biological activity (the biological activities of the THAP-family 
proteins described herein), expressing the encoded portion of the THAP-2 to THAP1 1 or THAP-0 
protein (e.g., by recombinant expression in vitro or in vivo) and assessing the activity of the 
encoded portion of the THAP-2 to THAP1 1 or THAP-0 protein. 

The invention further encompasses nucleic acid molecules that differ from the THAP-2 to 
THAP11 or THAP-0 nucleotide sequences of the invention due to degeneracy of the genetic code 
and encode the same THAP-2 to THAP1 1 or THAP-0 protein, or fragment thereof, of the invention. 

In addition to the THAP-2 to THAP1 1 or THAP-0 nucleotide sequences described above, it 
will be appreciated by those skilled in the art that DNA sequence polymorphisms that lead to 
changes in the amino acid sequences of the respective THAP-2 to THAP1 1 or THAP-0 protein may 
exist within a population (e.g., the human population). Such genetic polymorphism may exist 
among individuals within a population due to natural allelic variation. Such natural allelic variations 
can typically result in 1-5% variance in the nucleotide sequence of a particular THAP-2 to THAPl 1 
or THAP-0 gene. 

Nucleic acid molecules corresponding to natural allelic variants and homologues of the 
THAP-2 to THAPl 1 or THAP-0 nucleic acids of the invention can be isolated based on their 
homology to the THAP-2 to THAPl 1 or THAP-0 nucleic acids disclosed herein using the cDNAs 
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disclosed herein, or a portion thereof, as a hybridization probe according to standard hybridization 

techniques under stringent hybridization conditions. 

Probes based on the THAP-2 to THAP11 or THAP-0 nucleotide sequences can be used to 
detect transcripts or genomic sequences encoding the same or. homologous proteins. In preferred 
embodiments, the probe further comprises a label group attached thereto, e.g., the label group can 
be a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be 
used as a part of a diagnostic test kit for identifying cells or tissue which misexpress a THAP-2 to 
THAP11 or THAP-0 protein, such as by measuring a level of a THAP-2 to THAP11 or THAP-0- 
encoding nucleic acid in a sample of cells from a subject e.g., detecting THAP-2 to THAP11 or 
THAP-0 mRNA levels or determining whether a genomic THAP-2 to THAP1 1 or THAP-0 gene 
has been mutated or deleted. 

THAP-2 to THAP11 and THAP-0 Polypeptides 

The term "THAP-2 to THAP11 or THAP-0 polypeptides" is used herein to embrace all of 
the proteins and polypeptides of the present invention relating to THAP-2, THAP-3, THAP-4, 
THAP-5, THAP-6, THAP-7, THAP-8, THAP-9, THAP10, THAP11 and THAP-0. Also forming 
part of the invention are polypeptides encoded by the polynucleotides of the invention, as well as 
fusion polypeptides comprising such polypeptides. The invention embodies THAP-2 to THAP1 1 or 
THAP-0 proteins from humans, including isolated or purified THAP-2 to THAP11 or THAP-0 
proteins consisting of, consisting essentially of, or comprising a sequence selected from the group 
consisting of SEQ ED NOs: 4-14, 17-21, 23-40, 42-56, 58-98 and 100-1 14. 

The invention concerns the polypeptide encoded by a nucleotide sequence selected from the 
group consisting of SEQ ID NOs: 161-171, 172-175 and a complementary sequence thereof and a 
fragment thereof. 

The present invention embodies isolated, purified, and recombinant polypeptides 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, 
more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 150, 200, 300 or 500 amino acids, to the 
extent that said span is consistent with the particular SEQ ID NO:, of a sequence selected from the 
group consisting of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 and 100-1 14. In other preferred 
embodiments the contiguous stretch of amino acids comprises the site of a mutation or functional 
mutation, including a deletion, addition, swap or truncation of the amino acids in the THAP-2 to 
THAP1 1 or THAP-0 protein sequence. 

One aspect of the invention pertains to isolated THAP-2 to THAP11 and THAP-0 proteins, 
and biologically active portions thereof, as well as polypeptide fragments suitable for use as 
immunogens to raise anti-THAP-2 to THAP11 or THAP-0 antibodies. In one embodiment, native 
THAP-2 to THAP11 or THAP-0 proteins can be isolated from cells or tissue sources by an 
appropriate purification scheme using standard protein purification techniques. In another 
embodiment, THAP-2 to THAP11 or THAP-0 proteins are produced by recombinant DNA 
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techniques. Alternative to recombinant expression, a THAP-2 to THAP1 1 or THAP-0 protein or 

polypeptide can be synthesized chemically using standard peptide synthesis techniques. 

Biologically active portions of a THAP-2 to THAP1 1 or THAP-0 protein include peptides 

comprising amino acid sequences sufficiently homologous to or derived from the amino acid 

sequence of the THAP-2 to THAP11 or THAP-0 protein, e.g., an amino acid sequence shown in 

SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-1 14, which include less amino acids than the 

respective frill length THAP-2 to THAP11 or THAP-0 protein, and exhibit at least one activity of 

the THAP-2 to THAP11 or THAP-0 protein. The present invention also embodies isolated, 

purified, and recombinant portions or fragments of a THAP-2 to THAP1 1 or THAP-0 polypeptide 

cr uprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, 

more preferably at least 12, 15, 20, 25, 30, 40, 50, 100,150, 200, 300 or 500 amino acids, to the 

extent that said span is consistent with the particular SEQ ID NO, of a sequence selected from the 

group consisting of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 and 100-114. Also 

encompassed are THAP-2 to THAP11 or THAP-0 polypeptides which comprise between 10 and 

20, between 20 and 50, between 30 and 60, between 50 and 100, or between 100 and 200 amino 

acids of a sequence selected from the group consisting of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 

58-98 and 100-114. In other preferred embodiments the contiguous stretch of amino acids 

comprises the site of a mutation or functional mutation, including a deletion, addition, swap or 

truncation of the amino acids in the THAP-2 to THAP1 1 or THAP-0 protein sequence. 

A biologically active THAP-2 to THAPll or THAP-0 protein may, for example, comprise 

at least 1, 2, 3, 5, 10, 20 or 30 amino acid changes from the sequence of SEQ ID NOs: 4-14, 17-21, 

23-40, 42-56, 58-98 or 100-114, or may encode a biologically active THAP-2 to THAPll or 

THAP-0 protein comprising at least 1%, 2%, 3%, 5%, 8%, 10% or 15% changes in amino acids 

from the sequence of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-1 14. 

In a preferred embodiment, the THAP-2 protein comprises, consists essentially of, or 

consists of a THAP-2 THAP domain, preferably having the amino acid sequence of amino acid 

positions 1 to 89 shown in SEQ ID NO: 4, or fragments or variants thereof. The invention also 

concerns the polypeptide encoded by the THAP-2 nucleotide sequences of the invention, or a 

complementary sequence thereof or a fragment thereof. The present invention thus also embodies 

isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 

of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 

preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 89 amino acids of a sequence comprising 

amino acid positions 1 to 89 of SEQ ID NO: 4. In another aspect, a THAP-2 polypeptide may 

comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 

60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 

amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 

present invention are isolated, purified, nucleic acids encoding a THAP-2 polypeptide comprising, 
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consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 89 shown in 

SEQ ID NO: 4, or fragments or variants thereof. Preferably, said THAP-2 polypeptide comprises a 

PAR-4 binding domain and/or a DNA binding domain. 

In a preferred embodiment, the THAP-3 protein comprises, consists essentially of, or 

consists of a THAP-3 THAP domain, preferably having the amino acid sequence of amino acid 

positions 1 to 89 shown in SEQ ID NO: 5, or fragments or variants thereof The invention also 

concerns the polypeptide encoded by the THAP-3 nucleotide sequences of the invention, or a 

complementary sequence thereof or a fragment thereof. The present invention thus also embodies 

isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 

of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 

preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 89 amino acids of a sequence comprising 

amino acid positions 1 to 89 of SEQ ID NO: 5. In another aspect, a THAP-3 polypeptide may 

comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about' 

60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 

amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 

present invention are isolated, purified, nucleic acids encoding a THAP-3 polypeptide comprising, 

consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 89 shown in 

SEQ ID NO: 5, or fragments or variants thereof. Preferably, said THAP-3 polypeptide comprises a 

PAR-4 binding domain and/or a DNA binding domain. 

In a preferred embodiment, the THAP-4 protein comprises, consists essentially of, or 

consists of a THAP-4 THAP domain, preferably having the amino acid sequence of amino acid 

positions 1 to 90 shown in SEQ ID NO: 6, or fragments or variants thereof The invention also 

concerns the polypeptide encoded by the THAP-4 nucleotide sequences of the invention, or a 

complementary sequence thereof or a fragment thereof The present invention thus also embodies 

isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 

of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 

preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 

amino acid positions 1 to 90 of SEQ ID NO: 6. In another aspect, a THAP-4 polypeptide may 

comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 

60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 

amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 

present invention are isolated, purified, nucleic acids encoding a THAP-4 polypeptide comprising, 

consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in 

SEQ ID NO: 6, or fragments or variants thereof 

In a preferred embodiment, the THAPr5 protein comprises, consists essentially of, or 

consists of a THAP-5 THAP domain, preferably having the amino acid sequence of amino acid 

positions 1 to 90 shown in SEQ ID NO: 7, or fragments or variants thereof. The invention also 
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concerns the polypeptide encoded by the THAP-5 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 7. In another aspect, a THAP-5 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-5 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions I to 90 shown in 
SEQ ID NO: 7, or fragments or variants thereof. 

In a preferred embodiment, the THAP-6 protein comprises, consists essentially of, or 
consists of a THAP-6 THAP domain, preferably having the amino acid sequence of amino acid 
positions 1 to 90 shown in SEQ ID NO: 8, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP-6 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 8. In another aspect, a THAP-6 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-6 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in 
SEQ ID NO: 8, or fragments or variants thereof. 

In a preferred embodiment, the THAP-7 protein comprises, consists essentially of, or 
consists of a THAP-7 THAP domain, preferably having the amino acid sequence of amino acid 
positions 1 to 90 shown in SEQ ID NO: 9, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP-7 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 9. In another aspect, a THAP-7 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
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60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-7 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in 
SEQ ID NO: 9, or fragments or variants thereof. 

In a preferred embodiment, the THAP-8 protein comprises, consists essentially of, or 
consists of a THAP-8 THAP domain, preferably having the amino acid sequence of amino acid 
positions 1 to 92 shown in SEQ ID NO: 10, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP-8 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 92 of SEQ ID NO: 10. In another aspect, a THAP-8 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-8 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 92 shown in 
SEQ ED NO: 10, or fragments or variants thereof. 

In a preferred embodiment, the THAP-9 protein comprises, consists essentially of, or 
consists of a THAP-9 THAP domain, preferably having the amino acid sequence of amino acid 
positions 1 to 92 shown in SEQ ED NO: 11, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP-9 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 92 of SEQ ID NO: 11. In another aspect, a THAP-9 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-9 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 92 shown in 
SEQ ID NO: 1 1 , or fragments or variants thereof. 

In a preferred embodiment, the THAP10 protein comprises, consists essentially of, or 
consists of a THAP 10 THAP domain, preferably having the amino acid sequence of amino acid 
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positions 1 to 90 shown in SEQ ID NO: 12, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP10 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 12. In another aspect, a THAP10 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP10 polypeptide comprising, 
consisting essentially of, or consisting, of a THAP domain at amino acid positions 1 to 90 shown in 
SEQ ID NO: 12, or fragments or variants thereof. 

In a preferred embodiment, the THAP11 protein comprises, consists essentially of or 
consists of a THAP11 THAP domain, preferably having the amino acid sequence of amino acid 
positions I to 90 shown in SEQ ID NO: 13, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP11 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 13. In another aspect, a THAP1 1 polypeptide may 
comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 
60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAPl 1 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in' 
SEQ ID NO: 13, or fragments or variants thereof. 

In a preferred embodiment, the THAP-0 protein comprises, consists essentially of or 
consists of a THAP-0 THAP domain, preferably having the amino acid sequence of amino acid 
positions 1 to 90 shown in SEQ ID NO: 14, or fragments or variants thereof. The invention also 
concerns the polypeptide encoded by the THAP-0 nucleotide sequences of the invention, or a 
complementary sequence thereof or a fragment thereof. The present invention thus also embodies 
isolated, purified, and recombinant polypeptides comprising, consisting essentially of or consisting 
of a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, 70, 80 or 90 amino acids of a sequence comprising 
amino acid positions 1 to 90 of SEQ ID NO: 14. In another aspect, a THAP-0 polypeptide may 
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comprise a THAP domain wherein at least about 95%, 90%, 85%, 50-80%, preferably at least about 

60-70%, more preferably at least about 65% of the amino acid residues are identical or similar 
amino acids-to the THAP domain consensus domain (SEQ ID NOs: 1-2). Also encompassed by the 
present invention are isolated, purified, nucleic acids encoding a THAP-0 polypeptide comprising, 
consisting essentially of, or consisting of a THAP domain at amino acid positions 1 to 90 shown in 
SEQ ID NO: 14, or fragments or variants thereof 

In other embodiments, the THAP-2 to THAP 11 or THAP-0 protein is substantially 
homologous to the sequences of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-1 14 and 
retains the functional activity of the THAP-2 to THAP 11 or THAP-0 protein, yet differs in amino 
acid sequence due to natural allelic variation or mutagenesis, as described further herein. 
Accordingly, in another embodiment, the THAP-2 to THAP 11 or THAP-0 protein is a protein 
which comprises an amino acid sequence that shares more than about 60% but less than 100% 
homology with the amino acid sequence of SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100- 
114 and retains the functional activity of the THAP-2 to THAP1 1 or THAP-0 proteins of SEQ ID 
NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-114, respectively. Preferably, the protein is at least 
about 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 92%, 95%, 97%, 98%, 99% or 99.8% 
homologous to SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-114, but is not identical to 
SEQ ID NOs: 4-14, 17-21, 23-40, 42-56, 58-98 or 100-1 14. Preferably the THAP-2 to THAP1 1 or 
THAP-0 is less than identical (e.g. 100% identity) to a naturally occurring THAP-2 to THAP11 or 
THAP-0. Percent homology can be determined as further detailed above. 

Assessing polypeptides, methods for obtaining variant nucleic acids and polypeptides 
It will be appreciated that by characterizing the function of THAP-family polypeptides, the 
invention further provides methods of testing the activity of, or obtaining, functional fragments and 
variants of THAP-family and THAP domain nucleotide sequences involving providing a variant or 
modified THAP-family or THAP domain nucleic acid and assessing whether a polypeptide encoded 
thereby displays a THAP-family activity of the invention. Encompassed is thus a method of 
assessing the function of a THAP-family or THAP domain polypeptide comprising : (a) providing a 
THAP family or THAP domain polypeptide, or a biologically active fragment or homologue 
thereof; and (b) testing said THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof for a THAP-family activity. Any suitable format may be used, 
including cell free, cell-based and in vivo formats. Fot example, said assay may comprise 
expressing a THAP-family or THAP domain nucleic acid in a host cell, and observing THAP- 
family activity in said cell. In another example, a THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof is introduced to a cell, and a THAP-family 
activity is observed. THAP-family activity may be any activity as described herein, includingr (1) 
mediating apoptosis or cell proliferation when expressed or introduced into a cell, most preferably, 
inducing or enhancing apoptosis, and/or most preferably reducing cell proliferation; (2) mediating 
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apoptosis or cell proliferation of an endothelial cell; (3) mediating apoptosis or cell proliferation of 
a hyperproliferative cell; (4) mediating apoptosis or cell proliferation of a CNS cell, preferably a 
neuronal or glial cell; or (5) an activity determined in an animal selected from the group consisting 
of mediating, preferably inhibiting angiogenesis, mediating, preferably inhibiting inflammation, 
inhibition of metastatic potential of cancerous tissue, reduction of tumor burden, increase in 
sensitivity to chemotherapy or radiotherapy, killing a cancer cell, inhibition of the growth of a 
cancer cell, or induction of tumor regression. 

In addition to naturally-occurring allelic variants of the THAP-family or THAP domain 
sequences that may exist in the population, the skilled artisan will appreciate that changes can be 
introduced by mutation into the nucleotide sequences of SEQ ID NOs: 160-171, thereby leading to 
changes in the amino acid sequence of the encoded THAP-family or THAP domain proteins, with 
or without altering the functional ability of the THAP-family or THAP domain proteins. 

Several types of variants are contemplated including 1) one in which one or more of the 
amino acid residues are substituted with a conserved or non-conserved amino acid residue and such 
substituted amino acid residue may or may not be one encoded by the genetic code, or 2) one in 
which one or more of the amino acid residues includes a substituent group, or 3) one in which the 
mutated THAP-family or THAP domain polypeptide is fused with another compound, such as a 
compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or 4) one 
in which the additional amino acids are fused to the mutated THAP-family or THAP domain 
polypeptide, such as a leader or secretory sequence or a sequence which is employed for 
purification of the mutated THAP-family or THAP domain polypeptide or a preprotein sequence. 
Such variants are deemed to be within the scope of those skilled in the art. 

For example, nucleotide substitutions leading to amino acid substitutions can be made in 
the sequences of SEQ ID NOs: 160-175 that do not substantially change the biological activity of 
the protein. An amino acid residue-can be altered from the wild-type sequence encoding a THAP 
family or THAP domain polypeptide, or a biologically active fragment or homologue thereof 
without altering the biological activity^In general, amino acid residues that are conserved among 
the THAP-family of THAP domain-containing proteins of the present invention; are predicted to be 
less amenable to alteration. Furthermore, additional conserved amino acid residues may be amino 
acids that are conserved between the THAP-family proteins of the present invention. 

In one aspect, the invention pertains to nucleic acid molecules encoding THAP family or 
THAP domain polypeptides, or biologically active fragments or homologues thereof that contain 
changes in amino acid residues that are not essential for activity. Such THAP-family proteins differ 
in amino acid sequence from SEQ ID NOs: 1-114 yet retain biological activity. In one 
embodiment, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a 
protein, wherein the protein comprises an amino acid sequence at least about 60% homologous to 
an amino acid sequence selected from the group consisting of SEQ ID NOs: 1,1 14. Preferably, the 
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protein encoded by the nucleic acid molecule is at least about 65-70% homologous to an amino acid 
sequence selected from the group consisting of SEQ ID NOs: 1-114, more preferably sharing at 
least about 75-80% identity with an amino acid sequence selected from the group consisting of SEQ 
ED NOs: 1-114, even more preferably sharing at least about 85%, 90%, 92%, 95%, 97%, 98%, 99% 
or 99. S% identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 
1-114. 

In another aspect, the invention pertains to nucleic acid molecules encoding THAP-family 
proteins that contain changes in amino acid residues that result in increased biological activity, or a 
modified biological activity. In another aspect, the invention pertains to nucleic acid molecules 
encoding THAP-family proteins that contain changes in amino acid residues that are essential for a 
THAP-family activity. Such THAP-family proteins differ in amino acid sequence from SEQ ID 
NOs: 1-1 14 and display reduced or essentially lack one or more THAP-family biological activities. 
The invention also encompasses a THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof which may be useful as dominant negative mutant of a THAP 
family or THAP domain polypeptide. 

An isolated nucleic acid molecule encoding a THAP family or THAP domain polypeptide, 
or a biologically active fragment or homologue thereof homologous to a protein of any one of SEQ 
ID NOs: 1-114 can be created by introducing one or more nucleotide substitutions, additions or 
deletions into the nucleotide sequence of SEQ ED NOs: 1-114 such that one or more amino acid 
substitutions, additions or deletions are introduced into the encoded protein. Mutations can be 
introduced into any of SEQ ID NOs: 1-114, by standard techniques, such as site-directed 
mutagenesis and PCR-mediated mutagenesis. For example, conservative amino acid substitutions 
may be made at one or more predicted non-essential amino acid residues. A "conservative amino 
acid substitution" is one in which the amino acid residue is replaced with an amino acid residue 
having a similar side chain. Families of amino acid residues having similar side chains have been 
defined in the art. These families include amino acids with basic side chains (e.g., lysine, arginine, 
histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., 
glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., 
alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched 
side chains (eig., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, 
phenylalanine, tryptophan, histidine). Thus, a predicted nonessential amino acid residue in a THAP 
family or THAP domain polypeptide, or a biologically active fragment or homologue thereof may 
be replaced with another amino acid residue from the same side chain family. Alternatively, in 
another embodiment, mutations can be introduced randomly along all or part of a THAP-family or 
THAP domain coding sequence, such as by saturation mutagenesis, and the resultant mutants can 
be screened for THAP-family biological activity to identify mutants that retain activity. Following 
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mutagenesis of one of SEQ ID NOs: 1-114, the encoded protein can be expressed recombinantly 
and the activity of the protein can be determined. 

In a preferred embodiment, a mutant THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof encoded by a THAP family or THAP domain 
polypeptide, or a biologically active fragment or homologue thereof of THAP domain nucleic acid 
of the invention can be assayed for a THAP-family activity in any suitable assay, examples of 
which are provided herein. 

The invention also provides THAP-family or THAP domain chimeric or fusion proteins. As 
used herein, a THAP-family or THAP domain "chimeric protein" or "fusion protein" comprises a 
THAP-family or THAP domain polypeptide of the invention operatively linked, preferably fused in 
frame, to a non-THAP-family or non-THAP domain polypeptide. In a preferred embodiment, a 
THAP-family or THAP domain fusion protein comprises at least one biologically active portion of 
a THAP-family or THAP domain protein. In another preferred embodiment, a THAP-family fusion 
protein comprises at least two biologically active portions of a THAP-family protein. For example, 
in one embodiment, the fusion protein is a GST-THAP-family fusion protein in which the THAP- 
family sequences are fused to the C-terminus of the GST sequences. Such fusion proteins can 
facilitate the purification of recombinant THAP-family polypeptides. In another embodiment, the 
fusion protein is a THAP-family protein containing a heterologous signal sequence at its N- 
terminus, such as for example to allow for a desired cellular localization in a certain host cell. 

The THAP-family or THAP domain fusion proteins of the invention can be incorporated 
into pharmaceutical compositions and administered to a subject in vivo. Moreover, the THAP- 
family-fusion or THAP domain proteins of the invention can be used as immunogens to produce 
anti-THAP-family or anti or THAP domain antibodies in a subject, to purify THAP-family or 
THAP domain ligands and in screening assays to identify molecules which inhibit the interaction of 
THAP-family or THAP domain with a THAP-family or THAP domain target molecule. 

Furthermore, isolated peptidyl portions of the subject THAP-family or THAP domain 
proteins can also be obtained by screening peptides recombinantly produced from the 
corresponding fragment of the nucleic acid encoding such peptides. In addition, fragments can be 
chemically synthesized using techniques known in the art such as conventional Merrifield solid 
phase f-Moc or t-Boc chemistry. For example, a THAP-family or THAP domain protein of the 
present invention may be arbitrarily divided into fragments of desired length with no overlap of the 
fragments, or preferably divided into overlapping fragments of a desired length. The fragments can 
be produced (recombinantly or by chemical synthesis) and tested to identify those peptidyl 
fragments which can function as either agonists or antagonists of a THAP-family protein activity, 
such as by microinjection assays or in vitro protein binding assays. In an illustrative embodiment, 
peptidyl portions of a THAP-family protein, such as a THAP domain or a THAP-family target 
binding region (e.g. PAR4 in the case of THAP1, THAP-2 and THAP-3), can be tested for THAP- 
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family activity by expression as thioredoxin fusion proteins, each of which contains a discrete 

fragment of the THAP-family protein (see, for example, U.S. Patents 5,270,181 and 5,292,646; and 
PCT publication W094/02502). 

The present invention also pertains to variants of the THAP-family or THAP domain 
proteins which function as either THAP-family or THAP domain mimetics or as THAP-family or 
THAP domain inhibitors. Variants of the THAP-family or THAP domain proteins can be generated 
by mutagenesis, e.g., discrete point mutation or truncation of a THAP-family or THAP domain 
protein. An agonist of a THAP-family or THAP domain protein can retain substantially the same, 
or a subset, of the biological activities of the naturally occurring form of a THAP-family or THAP 
domain protein. An antagonist of a THAP-family or THAP domain protein can inhibit one or more 
of the activities of the naturally occurring form of the THAP-family or THAP domain protein by, 
for example, competitively inhibiting the association of a THAP-family or THAP domain protein 
with a THAP-family target molecule. Thus, specific biological effects can be elicited by treatment 
with a variant of limited function. In one embodiment, variants of a THAP-family or THAP domain 
protein which function as either THAP-family or THAP domain agonists (mimetics) or as THAP- 
family or THAP domain antagonists can be identified by screening combinatorial libraries of 
mutants, e.g., truncation mutants, of a THAP-family or THAP domain protein for THAP-family or 
THAP domain protein agonist or antagonist activity. In one embodiment, a variegated library of 
THAP-family variants is generated by combinatorial mutagenesis at the nucleic acid level and is 
encoded by a variegated gene library. A variegated library of THAP-family variants can be 
produced by, for example, enzymatically ligating a mixture of synthetic oligonucleotides into gene 
sequences such that a degenerate set of potential THAP-family sequences is expressible as 
individual polypeptides, or alternatively, as a set of larger fusion proteins (e.g., for phage display) 
containing the set of THAP-family sequences therein. There are a variety of methods which can be 
used to produce libraries of potential THAP-family variants from a degenerate oligonucleotide 
sequence. Chemical synthesis of a degenerate gene sequence can be performed in an automatic 
DNA synthesizer, and the synthetic gene then ligated into an appropriate expression vector. Use of 
a degenerate set of genes allows for the provision, in one mixture, of all of the sequences encoding 
the desired set of potential THAP-family sequences. 

In addition, libraries of fragments of a THAP-family or THAP domain protein coding 
sequence can be used to generate a variegated population of THAP-family or THAP domain 
fragments for screening and subsequent selection of variants of a THAP-family or THAP domain 
protein. In one embodiment, a library of coding sequence fragments can be generated by treating a 
double stranded PCR fragment of a THAP-family coding sequence with a nuclease under 
conditions wherein nicking occurs only about once per molecule, denaturing the double stranded 
DNA, renaturing the DNA to form double stranded DNA which can include sense/antisense pairs 
from different nicked products, removing single stranded portions from reformed duplexes by 
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treatment with SI nuclease, and ligating the resulting fragment library into an expression vector 
By this method, an expression library can be derived which encodes N-terminal, C-terminal and 
internal fragments of various sizes of the THAP-family protein. 

Modified THAP-family or THAP domain proteins can be used for such purposes as 
enhancing therapeutic or prophylactic efficacy, or stability (e.g., ex vivo shelf life and resistance to 
proteolytic degradation in vivo). Such modified peptides, when designed to retain at least one 
activity of the naturally occurring form of the protein, are considered functional equivalents of the 
THAP-family or THAP domain protein described in more detail herein. Such modified peptide can 
be produced, for instance, by amino acid substitution, deletion, or addition. 

Whether a change in the amino acid sequence of a peptide results in a functional THAP- 
family or THAP domain homolog (e.g. functional in the sense that it acts to mimic or antagonize 
the wild-type form) can be readily determined by assessing the ability of the variant peptide to 
produce a response in cells in a fashion similar to the wild-type THAP-family or THAP domain 
protein or competitively inhibit such a response. Peptides in which more than one replacement has 
taken place can readily be tested in the same manner. 

This invention further contemplates a method of generating sets of combinatorial mutants 
of the presently disclosed THAP-family or THAP domain proteins, as well as truncation and 
fragmentation mutants, and is especially useful for identifying potential variant sequences which are 
functional in binding to a THAP-family- or THAP domain- target protein but differ from a wild- 
type form of the protein by, for example, efficacy, potency and/or intracellular half-life One 
purpose for screening such combinatorial libraries is, for example, to isolate novel THAP-family or 
THAP domain homologs which function as either an agonist or an antagonist of the biological 
activities of the wild-type protein, or alternatively, possess novel activities all together For 
example, mutagenesis can give rise to THAP-family homologs which have intracellular half-lives 
dramatically different than the corresponding wild-type protein. The altered protein can be 
rendered either more stable or less stable to proteolytic degradation or other cellular process which 
result m destruction of, or otherwise inactivation of, a THAP-family protein. Such THAP-family 
homologs, and the genes which encode them, can be utilized to alter the envelope of expression for 
a particular recombinant THAP-family protein by modulating the half-life of the recombinant 
protein. For instance, a short half-life can give rise to more transient biological effects associated 
with a particular recombinant THAP-family protein and, when part of an inducible expression 
system, can allow tighter control of recombinant protein levels within a cell. As above such 
proteins, and particularly their recombinant nucleic acid constructs, can be used in gene therapy 
protocols. 

In an illustrative embodiment of this method, the amino acid sequences for a population of 
THAP-family homologs or other related proteins are aligned, preferably to promote the highest 
homology possible. Such a population of variants can include, for example,, THAP-family 
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homologs from one or more species, or THAP-family homologs from the same species but which 

differ due to mutation. Amino acids which appear at each position of the aligned sequences are 

selected to create a degenerate set of combinatorial sequences. There are many ways by which the 

library of potential THAP-family homologs can be generated from a degenerate oligonucleotide 

sequence. Chemical synthesis of a degenerate gene sequence can be carried out in an automatic 

DNA synthesizer, and the synthetic genes then be ligated into an appropriate gene for expression. 

The purpose of a degenerate set of genes is to provide, in one mixture, all of the sequences 

encoding the desired set of potential THAP-family sequences. The synthesis of degenerate 

oligonucleotides is well known in the art (see for example. Narang, SA (1983) Tetrahedron 393; 

Itakura et al. (1981) Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. AG 

Walton, Amsterdam: Elsevier pp. 273-2S9; Itakura et al. (1984) Anna Rev. Biochem. 53:323; 

Itakura etal. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 11:477. Such techniques 

have been employed in the directed evolution of other proteins (see, for example, Scott et al. (1990) 

Science 249:386-390; Roberts et al. (1992) PNAS 89:2429-2433; Devlin et al. (1990) Science 249: 

404-406; Cwirla et al. (1990) PNAS 87: 6378-6382; as well as U.S. Patents Nos: 5,223,409, 

5,198,346, and 5,096,815). 

Alternatively, other forms of mutagenesis can be utilized to generate a combinatorial 
library, particularly where NO other naturally occurring homologs have yet been sequenced. For 
example, THAP-family homologs (both agonist and antagonist forms) can be generated and isolated 
from a library by screening using, for example, alanine scanning mutagenesis and the like (Ruf et 
al. (1994) Biochemistry 33:1565-1572; Wang et al (1994) J Biol. Chem. 269:3095-3099; Balint et 
al. (1993) Gene 137:109-118; Grodberg et al. (1993) Eur. J Biochem. 218:597-601; Nagashima et 
al. (1993) J Biol. Chem. 268:2888-2892; Lowman et al. (1991) Biochemistry 30:10832-10838; and 
Cunningham et al. (1989) Science 244:1081-1085), by linker scanning mutagenesis (Gustin et al. 
(1993) Virology 193:653-660; Brown et al. (1992) Mol. Cell Biol. 12:2644 2652; McKnight et al. 
(1982) Science 232:316); by saturation mutagenesis (Meyers et al. (1986) Science 232:613); by 
PCR mutagenesis (Leung et al. (1989) Method Cell Mol Biol 1: 1-19); or by random mutagenesis 
(Miller et al. (1992) A Short Course in Bacterial Genetics, CSHL Press, Cold Spring Harbor, NY; 
and Greener et al. (1994) Strategies in Mol Biol 7:32-34). 

A wide range of techniques are known in the art for screening gene products of 
combinatorial libraries made by point mutations, as well as for screening cDNA libraries for gene 
products having a certain property. Such techniques will be generally adaptable for rapid screening 
of the gene libraries generated by the combinatorial mutagenesis of THAP-family proteins. The 
most widely used techniques for screening large gene libraries typically comprises cloning the gene 
library into replicable expression vectors, transforming appropriate cells with the resulting library of 
vectors, and expressing the combinatorial genes under conditions in which detection of a desired 
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activity facilitates relatively easy isolation of the vector encoding the gene whose product was 

detected. 

Each of the illustrative assays described below are amenable to high through-put analysis as 
necessary to screen large numbers of degenerate THAP-family or THAP domain sequences created 
by combinatorial mutagenesis techniques. In one screening assay, the candidate gene products are 
displayed on the surface of a cell or viral particle, and the ability of particular cells or viral particles 
to bind a THAP-family target molecule (protein or DNA) via this gene product is detected in a 
"panning assay". For instance, the gene library can be cloned into the gene for a surface membrane 
protein of a bacterial cell, and the resulting fusion protein detected by panning (Ladner et al., WO 
88/06630; Fuchs et al. (1991) BiolTechnology 9:1370-1371, and Goward et al. (1992) TIBS 18:136 
140). In a similar fashion, fluorescently labeled THAP-family target can be used to score for 
potentially functional THAP-family homologs. Cells can be visually inspected and separated under 
a fluorescence microscope, or, where the morphology of the cell permits, separated by a 
fluorescence- activated cell sorter. 

In an alternate embodiment, the gene library is expressed as. a fusion protein on the surface 
of a viral particle. For instance, in the filamentous phage system, foreign peptide sequences can be 
expressed on the surface of infectious phage, thereby conferring two significant benefits. First, 
since these phage can be applied to affinity matrices at very high concentrations, a large number of 
phage can be screened at one time. Second, since each infectious phage displays the combinatorial 
gene product on its surface, if a particular phage is recovered from an affinity matrix in low yield, 
the phage can be amplified by another round of infection. The group of almost identical E. coli 
filamentous phages Ml 3, fd, and fl are most often used in phage display libraries, as either of the 
phage gill or gVIII coat proteins can be used to generate fusion proteins without disrupting the 
ultimate packaging of the viral particle (Ladner et al. PCT publication WO 90/02909; Garrard et al., 
PCT publication WO 92/09690; Marks et al. (1992) J Biol. Chem. 267:16007-16010; Griffiths et al. 
(1993) EMBO J 12:725-734; Clackson et al. (1991) Nature 352:624-628; and Barbas et al. (1992) 
PNAS 89:4457 4461). In an illustrative embodiment, the recombinant phage antibody system 
(RPAS, Pharmacia Catalog number 27-9400-01) can be easily modified for use in expressing 
THAP-family combinatorial libraries, and the THAP-family phage library can be panned on 
immobilized THAP family target molecule (glutathione immobilized THAP-family target-GST 
fusion proteins or immobilized DNA). Successive rounds of phage amplification and panning can 
greatly enrich for THAP-family homologs which retain an ability to bind a THAP-family target and 
which can subsequently be screened further for biological activities in automated assays, in order to 
distinguish between agonists and antagonists. 

The invention also provides for identification and reduction to functional minimal size of 
the THAP-family domains, particularly a THAP domain of the subject THAP-family to generate 
mimetics, e.g. peptide or non-peptide agents, which are able to disrupt binding of a polypeptide of 
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the present invention with a THAP-family target molecule (protein or DNA). Thus, such mutagenic 
techniques as described above are also useful to map the determinants of THAP-family proteins 
which participate in protein-protein or protein-DNA interactions involved in, for example, binding 
to a THAP-family or THAP domain target protein or DNA. To illustrate, the critical residues of a 
THAP-family protein which are involved in molecular recognition of the THAP-family target can 
be determined and used to generate THAP-family target- 13P-derived peptidomimetics that 
competitively inhibit binding of the THAP-family protein to the THAP-family target. By 
employing, for example, scanning mutagenesis to map the amino acid residues of a particular 
THAP-family protein involved in binding a THAP-family target, peptidomimetic compounds can 
be generated which mimic those residues in binding to a THAP-family target, and which, by 
inhibiting binding of the THAP-family protein to the THAP-family target molecule, can interfere 
with the function of a THAP-family protein in transcriptional regulation of one or more genes. For 
instance, non hydrolyzable peptide analogs of such residues can be generated using retro-inverse 
peptides (e.g., see U.S. Patents 5,1 16,947 and 5,219,089; and Pallai et al. (1983) Int J Pept Protein 
Res 21:84-92), benzodiazepine (e.g., see Freidinger et al. in Peptides: Chemistry and Biology, G.R. 
Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), azepine (e.g., see Huffman et al. in 
Peptides.- Chemistry and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 
1988), substituted gamma lactam rings (Garvey et al. in Peptides: Chemistry and Biology, G.R. 
Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), keto-methylene pseudopeptides 
(Ewenson et al. (1986) J Med Chem 29:295; and Ewenson et al. in Peptides: Structure and Function 
(Proceedings of the 9th American Peptide Symposium) Pierce Chemical Co. Rockland, IL, 1985), 
P-turn dipeptide cores (Nagai et al. (1985) Tetrahedron Left 26:647; and Sato et al. (1986) J Chem 
Soc Perkin Trans 1: 123 1), and P-aminoalcohols (Gordon et al. (1985) Biochem Biophys Res 
Commun 126:419; and Dann et al. (1986) Biochem Biophys Res Commun 134:71). 

An isolated THAP-family or THAP domain protein, or a portion or fragment thereof, can 
be used as an immunogen to generate antibodies that bind THAP-family or THAP domain proteins 
using standard techniques for polyclonal and monoclonal antibody preparation. A full-length 
THAP-family protein can be used or, alternatively, the invention provides antigenic peptide 
fragments of THAP-family or THAP domain proteins for use as immunogens. Any fragment of the 
THAP-family or THAP domain protein which contains at least one antigenic determinant may be 
used to generate antibodies. The antigenic peptide of a THAP-family or THAP domain protein 
comprises at least 8 amino acid residues of an amino acid sequence selected from the group 
consisting of SEQ ID NOs: 1-114 and encompasses an epitope of a THAP-family or THAP 
domain protein such that an antibody raised against the peptide forms a specific immune complex 
with a THAP-family or THAP domain protein. Preferably, the antigenic peptide comprises at least 
10 amino acid residues, more preferably at least 15 amino acid residues, even more preferably at 
least 20 amino acid residues, and most preferably at least 30 amino acid residues. 
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Preferred epitopes encompassed by the antigenic peptide are regions of a THAP-family or 
THAP domain protein that are located on the surface of the protein, e.g., hydrophilic regions. 

A THAP-family or THAP domain protein immunogen typically is used to prepare 
antibodies by immunizing a suitable subject, (e.g., rabbit, goat, mouse or other mammal) with the 
unmunogen. An appropriate immunogenic preparation can contain, for example, recombinant^ 
expressed THAP-family or THAP domain protein or a chemically synthesized THAP-family or 
THAP domain polypeptide. The preparation can further include an adjuvant, such as Freund's 
complete or incomplete adjuvant, or similar immunostimulatory agent. Immunization of a suitable 
subject with an immunogenic THAP-family or THAP domain protein preparation induces a 
polyclonal anti-THAP-family or THAP domain protein antibody response. 

The invention concerns antibody compositions, either polyclonal or monoclonal, capable of 
selectively binding, or selectively bind to an epitope-containing a polypeptide comprise a 
contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, 100, or more than 100 amino acids of an amino acid sequence 
selected from the group consisting of amino acid positions 1 to approximately 90 of SEQ ID NOs- 
1-1 14. The invention also concerns a purified or isolated antibody capable of specifically binding to 
a mutated THAP-family or THAP domain protein or to a fragment or variant thereof comprising an 
epitope of the mutated THAP-family or THAP domain protein. 
Oligomeric Forms of THAP1 

Certain embodiments of the present invention encompass THAP1 polypeptides in the form 
of oligomers, such as dimers, trimers, or higher oligomers. Oligomers may be formed by disulfide 
bonds between cysteine residues on different THAP1 polypeptides, for example. In other 
embodiments, oligomers comprise from two to four THAP1 polypeptides joined by covalent or 
non-covalent interactions between peptide moieties fused to the THAP1 polypeptides. Such peptide 
moieties may be peptide linkers (spacers), or peptides that have the property of promoting 
ohgomerization. Leucine zippers and certain polypeptides derived from antibodies are among the 
peptides that can promote ohgomerization of THAPl polypeptides attached thereto DNA 
sequences encoding THAPl oligomers, or fusion proteins that are components of such oligomers 
are provided herein. 

In one embodiment of the invention, oligomeric THAPl may comprise two or more 
THAPl polypeptides joined through peptide linkers. Examples include those peptide linkers 
described in U.S. Patent No. 5,073,627. Fusion proteins comprising multiple THAPl polypeptides 
separated by peptide linkers may be produced using conventional recombinant DNA technology. 

Another method for preparing THAPl oligomers involves use of a leucine zipper. Leucine 
zipper domains are peptides that promote ohgomerization of the proteins in which they are found. 
Leucine zippers were originally identified in several DNA-binding proteins (Landschulz et al 
Science 240:1759, 1988), and have since been found in a variety of different proteins. Among the 
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known leucine zippers are naturally occurring peptides and derivatives thereof that dimerize or 

trimerize. Examples of leucine zipper domains suitable for producing THAP1 oligomers are those 

described International Publication WO 94/10308. Recombinant fusion proteins comprising a 

THAP1 polypeptide fused to a peptide that dimerizes or trimerizes in solution are expressed in 

suitable host cells, and the resulting soluble oligomeric THAP1 is recovered from the culture 

supernatant. 

In some embodiments of the invention, a THAP1 dimer is created by fusing THAP1 to an 
Fc region polypeptide derived from an antibody, in a manner that does not substantially affect the 
binding of THAP1 to the chemokine SLC/CCL21. Preparation of fusion proteins comprising 
heterologous polypeptides fused to various portions of antibody-derived polypeptides (including Fc 
region) has been described, e.g., by Ashkenazi et al. (1991) PNAS 88:10535, Byrn et al. (1990) 
Nature 344:667, and Hollenbaugh and Aruffo "Construction of Immunoglobulin Fusion Proteins", 
in Current Protocols in Immunology, Supp. 4, pages 10.19.1 -10.19-11, 1992. The THAPl/Fc 
fusion proteins are allowed to assemble much like antibody molecules, whereupon interchain 
disulfide bonds form between Fc polypeptides, yielding divalent THAP1. Similar fusion proteins of 
TNF receptors and Fc (see for example Moreland et al. (1997) N. Engl. J. Med. 337(3): 141 -147; 
van der Poll et al. (1997) Blood 89(10):3727-3734; and Ammann et al. (1997) J. Clin. Invest. 
99(7): 1699- 1703) have been used successfully for treating rheumatoid arthritis. Soluble derivatives 
have also been made of cell surface glycoproteins in the immunoglobulin gene superfamily 
consisting of an extracellular domain of the cell surface glycoprotein fused to an immunoglobulin 
constant (Fc) region (see e.g., Capon, D. J. et al. (1989) Nature 337:525-531 and Capon U.S. Pat 
Nos. 5,116,964 and 5,428,130 [CD4-IgGl constructs]; Linsley, P. S. et al. (1991) J. Exp. Med. 
173:721-730 [a CD28-IgGl construct and aB7-l-IgGl construct]; and Linsley, P. S. et al. (1991) J. 
Exp. Med. 174:561-569 and U.S. Patent No. 5,434,131 [a CTLA4-IgGl]). Such fusion proteins 
have proven useful for modulating receptor-ligand interactions. 

Some embodiments relate to THAP-immvmoglobulin fusion proteins and THAP SLC- 
binding domain fusions with immunoglobulin molecules or fragments thereof. Such fusions can be 
produced using standard methods, for example, by creating an expression vector encoding the 
SLC/CCL21 chemokine-binding protein THAP1 fused to the antibody polypeptide and inserting the 
vector into a suitable host cell. One suitable Fc polypeptide is the native Fc region polypeptide 
derived from a human IgGl, which is described in International Publication WO 93/10151 . Another 
useful Fc polypeptide is the Fc mutein described in U.S. Patent No. 5,457,035. The amino acid 
sequence of the mutein is identical to that of the native Fc sequence presented in International 
Publication WO 93/10151, except that amino acid 19 has been changed from Leu to Ala, amino 
acid 20 has been changed from Leu to Glu, and amino acid 22 has been changed from Gly to Ala. 
This mutein Fc exhibits reduced affinity for immunoglobulin receptors. 
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bLC-bmding fragments of human THAPl, rather than the full protein, can also be 
employed in methods of the invention. Fragments may be less immunogenic than the corresponding 
full-length proteins. The ability of a fragment to bind chemokine SLC can be determined using a 
standard assay. Fragments can be prepared by any of a number of conventional methods. For 
example, a desired DNA sequence can be synthesized chemically or produced by restriction 
endonuclease digestion of a full length cloned DNA sequence and isolated by electrophoresis on 
agarose gels. Linkers containing restriction endonuclease cleavage sites can be employed to insert 
the desired DNA fragment into an expression vector, or the fragment can be digested at naturally- 
present cleavage sites. The polymerase chain reaction (PCR) can also be employed to isolate a 
DNA sequence encoding a desired protein fragment. Oligonucleotides that define the termini of the 
desired fragment are used as 5 1 and 3' primers in the PCR procedure. Additionally, known 
mutagenesis techniques can be used to insert a stop codon at a desired point, e.g., immediately 
downstream of the codon for the last amino acid of the desired fragment. 

In other embodiments, THAPl or a biologically active fragment thereof, for example, an 
SLC-binding domain of THAP1 may be substituted for the variable portion of an antibody heavy or 
light chain. If fusion proteins are made with both heavy and light chains of an antibody, it is 
possible to form a THAP1 oligomer with at least two, at least three, at least four, at least five, at 
least six, at least seven, at least eight, at least nine, or more than nine THAP1 polypeptides. 

In some embodiments of the present invention, THAP-SLC binding can be provided to 
decrease the biological availability of SLC or otherwise disrupt the activity of SLC. For example, 
THAP-family polypeptides, SLC-binding domains of THAP-family polypeptides, THAP oligomers, 
and SLC-binding domain-THAPl -immunoglobulin fusion proteins of the invention can be used to 
interact with SLC thereby preventing it from performing its normal biological role. In some 
embodiments, the entire THAP1 polypeptide (SEQ ID NO: 3) can be used to bind to SLC. In other 
embodiments, fragments of THAP1, such as the SLC-binding domain of the THAP1 (amino acids 
143-213 of SEQ ID NO: 3) can used to bind to SLC. Such fragments can be from at least 8, at least 
10, at least 12, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at 
least 50, at least 55, at least 60, at least 65, at least 70, at least 80, at least 90, at least 100, at least 
110, at least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at 
least 190, at least 200, at least 210 or at least 213 consecutive amino acids of SEQ ID NO: 3. In 
some embodiments, fragments can be from at least 8, at least 10, at least 12, at least 15, at least 20, 
at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 
65 or at least 70 consecutive amino acids of (amino acids 143-213 of SEQ ID NO: 3). THAP- 
family polypeptides that may be capable of binding SLC, for example THAP2-11 and THAPO or 
biologically active fragments thereof can also be used to bind to SLC so as to decrease its biological 
availability or otherwise disrupt the activity of this chemokine. 



-75- 



WO 03/051917 PCT/EP02/14027 

In some embodiments, a plurality of THAP-family proteins, such as a fusion of two or more 

THAP1 proteins or fragments thereof which comprise an SLC-binding domain (amino acids 143- 

213 of SEQ ID NO: 3) can be used to bind SLC. For example, oligomers comprising THAP1 

fragments of a size of at least 8, at least 10, at least 12, at least 15, at least 20, at least 25, at least 30, 

at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65 or at least 70 

consecutive amino acids of SEQ ID NO: 3 (amino acids 143-213) can be generated. Amino acid 

fragments which make up the THAP oligomer may be of the same or different lengths. In some 

embodiments, the entire THAP1 protein or biologically active portions thereof may be fused 

together to form an oligomer capable of binding to SLC. THAP-family polypeptides that may be 

capable of binding SLC, for example THAP2-11 and THAPO, the THAP-family polypeptides of 

SEQ ID NOs: 16-114 or biologically active fragments thereof can also be used to create oligomers 

which bind to SLC so as to decrease its biological availability or otherwise disrupt the activity of 

this chemokine. 

According to another embodiment of the present invention, THAP-family proteins, such as 
THAP1 or portion of THAP 1 which comprise an SLC binding domain (amino acids 143-213 of 
SEQ ED NO: 3), may be fused to an immunoglubulin or portion thereof. The portion may be an 
entire immunoglobulin, such as IgG, IgM, IgA or IgE. Additionally, portions of immunoglobulins, 
such as an Fc domain of the immunoglobulin, can be fused to a THAP-family polypeptide, such as 
THAP1, fragments thereof or oligomers thereof . Fragments of THAP1 can be, for example, at 
least 8, at least 10, at least 12, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, 
at least 45, at least 50, at least 55, at least 60, at least 65 or at least 70 consecutive amino acids of 
SEQ ID NO: 3 (amino acids 143-213). In some embodiments, THAP-family polypeptides that may 
be capable of binding SLC, for example THAP2-1 1 and THAPO, the THAP-family polypeptides of 
SEQ ID NOs: 16-114 or biologically active fragments thereof can also be used to form 
immunglobulin fusion that bind to SLC so as to decrease its biological availability or otherwise 
disrupt the activity of this chemokine. 

In accordance with another aspect of the invention, THAP-family polypeptides, SLC- 
binding domains of THAP-family polypeptides, THAP oligomers, and SLC-binding domain- 
THAP1 -immunoglobulin fusion proteins of the invention can be incorporated into pharmaceutical 
compositions. Such pharmaceutical compositions can be used to decrease the bioavailability and 
functionality of SLC. For example, THAP-family polypeptides, SLC-binding domains of THAP- 
family polypeptides, THAP oligomers, and SLC- binding domain-THAPl -immunoglobulin fusion 
proteins of the present invention can be administered to a subject to inhibit an interaction between 
SLC and its receptor, such as CCR7, on the surface of cells, to thereby suppress SLC-mediated 
responses. The inhibitiori of chemokine SLC may be useful therapeutically for both the treatment 
of inflammatory or proliferative disorders, as well as modulating (e.g., promoting or inhibiting) cell 
differentiation, cell proliferation, and/or cell death. 
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In an additional embodiment of the present invention, the THAP-family polypeptides, SLC- 

binding domains of THAP-family polypeptides, THAP oligomers,' and SLC- binding domain- 

THAP1 -immunoglobulin fusion proteins of the present invention can be used to detect the presence 

of SLC in a biological sample and in screening assays to identify molecules which inhibit the 

interaction of THAP1 with SLC. Such screening assays are similar to those described below for 

PAR4-THAP interactions. 

Certain aspects of the present invention related to a method, of identifying a test compound 
that modulates THAP-mediated activites. In some cases the THAP-mediated acitivity is SLC- 
binding. Test compounds which affect THAP-SLC binding can be identified using a screening 
method wherein a THAP-family polypeptide or a biologically active fragment thereof is contacted 
with a test compound. In some embodiments, the THAP-family polypeptide comprises an amino 
acid sequence having at least 30% amino acid identity to an amino acid sequence of SEQ ID NO: 1 
or SEQ ID NO: 2. Whether the test compound modulates the binding of SLC with a THAP-family 
polypeptide, such as THAP1 (SEQ ID NO: 3), is determined by determining whether the test 
compound modulates the activity of the THAP-family polypeptide or biologically active fragment 
thereof Biologically active framents of a THAP-family polypeptide may be at least 5, at least 8, at 
least 10, at least 12, at least 15, at least 18, at least 20, at least 25, at least 30, at least 35, at least 40, 
at least 45, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 1 10, at 
least 120, at least 130, at least 140, at least 150, at least 160, at least 170, at least 180, at least 190, 
at least 200, at least 210, at least 220 or at least more than 220 amino acids in length. A 
determination that the test compound modulates the activity of said polypeptide indicates that the 
test compound is a candidate modulator of THAP-mediated activities. 

Although THAP-family polypeptides, SLC-binding domains of THAP-family polypeptides, 
THAP oligomers, and SLC-binding domain-THAPl -immunoglobulin fusion proteins can be used 
for the above-mentioned SLC interactions, it will be appreciated that homologs of THAP-family 
polypeptides, SLC-binding domains of THAP-family polypeptides, THAP oligomers, and SLC- 
binding domain-THAPl-immunoglobulin fusion proteins can be used in place of THAP-family 
polypeptides, SLC-binding domains of THAP-family polypeptides, THAP oligomers, and SLC- 
binding domain-THAPl -immunoglobulin fusion proteins. For example, homologs having at least 
about 30-40% identity, preferably at least about 40-50% identity, more preferably at least about 50- 
60%, and even more preferably at least about 60-70%, 70-80%, 80%, 90%, 95%, 97%, 98%, 99% 
or 99.8% identity across the amino acid sequences of SEQ ID NOs: 1-1 14 or portions thereof can 
be used. 

Primers and probes 

Primers and probes of the invention can be prepared by any suitable method, including, for 
example, cloning and restriction of appropriate sequences and direct chemical synthesis by a 
method such as the phosphodiester method of Narang SA et al (Methods Enzymol 1979;68:90-98), 
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the phosphodiester method of Brown EL et al (Methods Enzymoi 1979;68:109-151), the 

diethylphosphoramidite method of Beaucage et al (Tetrahedron Lett 1981, 22: 1859-1862) and the 

solid support method described in EP 0 707 592. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 
such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. If desired, the probe may be rendered "non-extendable" in that additional 
dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3 f end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating any label known in the art to be detectable by spectroscopic, photochemical, 
biochemical, immunochemical, or chemical means. For example, useful labels include radioactive 

32 35 3 125 , 

substances (including, P, S, H, I), fluorescent dyes (including, 5-bromodesoxyundm, 
fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at 
their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described 
in (Urdea et al. (Nucleic Acids Research. 1 1:4937-4957, 1988) or Sanchez-Pescador et al. (J. Clin. 
Microbiol. 26(10):1934-193S, 1988). In addition, the probes according to the present invention 
may have structural characteristics such that they allow the signal amplification, such structural 
characteristics being, for example, branched DNA probes as those described by Urdea et al (Nucleic 
Acids Symp. Ser. 24:197-200, 1991) or in the European patent No. EP 0 225 807 (Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of 
either the primer or a primer extension product, such as amplified DNA, on a solid support. A 
capture label is attached to the primers or probes and can be a specific binding member which forms 
a binding pair with the solid's phase reagent's specific binding member (e.g. biotin and 
streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a probe, it 
may be employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. 
For example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, 
it may be selected such that it binds a complementary portion of a primer or probe to thereby 
immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or 6t tair that is not complementary to the target. In the case where a polynucleotide 
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pnmer itself serves as the capture label, at least a portion of the primer will be free to hybridize with 

a nucleic acid on a solid phase. DNA labeling techniques are well known to the skilled technician. 

The probes of the present invention are useful for a number of purposes. They can be 
notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
PCR amplification products. They may also be used to detect mismatches in a THAP-family gene 
or mRNA using other techniques. 

Any of the nucleic acids, polynucleotides, primers and probes of the present invention can 
be conveniently immobilized on a solid support. Solid supports are known to those skilled in the art 
and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, 
nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other animal) red 
blood cells, duracytes and others. The solid support is not critical and can be selected by one 
skilled in the art. Thus, latex particles, microparticles, magnetic or non-magnetic beads, 
membranes, plastic tubes, walls of microtiter wells, glass or silicon chips, sheep (or other suitable 
animal's) red blood cells and duracytes are all suitable examples. Suitable methods for 
immobilizing nucleic acids on solid phases include ionic, hydrophobic, covalent interactions and 
the like. A solid support, as used herein, refers to any material which is insoluble, or can be made 
insoluble by a subsequent reaction. The solid support can be chosen for its intrinsic ability to attract 
and immobilize the capture reagent. Alternatively, the solid phase can retain an additional receptor 
which has the ability to attract and immobilize the capture reagent. The additional receptor can 
include a charged substance that is oppositely charged with respect to the capture reagent itself or to 
a charged substance conjugated to the capture reagent. As yet another alternative, the receptor 
molecule can be any specific binding member which is immobilized upon (attached to) the solid 
support and which has the ability to immobilize the capture reagent through a specific binding 
reaction. The receptor molecule enables the indirect binding of the capture reagent to a solid 
support material before the performance of the assay or during the performance of the assay. The 
solid phase thus can be a plastic, derivatized plastic, magnetic or non-magnetic metal, glass or 
silicon surface of a test tube, microtiter well, sheet, bead, microparticle, chip, sheep (or other 
suitable animal's) red blood cells, duracytes and other configurations known to those of ordinary 
skill in the art. The nucleic acids, polynucleotides, primers and probes of the invention can be 
attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 15, 
20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 
polynucleotides other than those of the invention may be attached to the same solid support as one 
or more polynucleotides of the invention. 

Any polynucleotide provided herein may be attached in overlapping areas or at random 
locations on a solid support. Alternatively the polynucleotides of the invention may be attached in 
an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
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ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 

recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 

typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 

substrate in different known locations. The knowledge of the precise location of each 

polynucleotides location makes these "addressable" arrays particularly useful in hybridization 

assays. Any addressable array technology known in the art can be employed with the 

polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 

known as the Genechips, and has been generally described in US Patent 5,143,854; PCT 

publications WO 90/15070 and 92/10092. 

Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, preferably expression vectors, 
containing a nucleic acid encoding a THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof 

Vectors may have particular use in the preparation of a recombinant protein of the 
invention, or for use in gene therapy. Gene therapy presents a means to deliver a THAP family or 
THAP domain polypeptide, or a biologically active fragment or homologue thereof to a subject in 
order to regulate apoptosis for treatment of a disorder 

As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting 
another nucleic acid to which it has been linked. One type of vector is a "plasmid", which refers to a 
circular double stranded DNA loop into which additional DNA segments can be ligated. Another 
type of vector is a viral vector, wherein additional DNA segments can be ligated into the viral 
genome. Certain vectors are capable of autonomous replication in a host cell into which they are 
introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian 
vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a 
host cell upon introduction into the host cell, and thereby are replicated along with the host genome. 
Moreover, certain vectors are capable of directing the expression of genes to which they are 
operatively linked. Such vectors are referred to herein as "expression vectors". In general, 
expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. In 
the present specification, "plasmid" and "vector" can be used interchangeably as the plasmid is the 
most commonly used form of vector. However, the invention is intended to include such other 
forms of expression vectors, such as viral vectors (e.g., replication defective retroviruses, 
adenoviruses and adeno-associated viruses), which serve equivalent functions. 

The recombinant expression vectors of the invention comprise a THAP-family or THAP 
domain nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host 
cell, which means that the recombinant expression vectors include one or more regulatory 
sequences, selected on the basis of the host cells to be used for expression, which is operatively 
linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, 
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"operably linked" is intended to mean that the nucleotide sequence of interest is linked to the 

regulatory sequence(s) in a manner which allows for expression of the nucleotide sequence (e.g., in 

an in vitro transcription/translation system or in a host cell when the vector is introduced into the 

host cell). The term "regulatory sequence" is intended to include promoters, enhancers and other 

expression control elements (e.g., polyadenylation signals). Such regulatory sequences are 

described, for example, in Goeddel; Gene Expression Technology: Methods in Enzymology 185, 

Academic Press, San Diego, Calif. (1990). Regulatory sequences include those which direct 

constitutive expression of a nucleotide sequence in many types of host cell and those which direct 

expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory 

sequences). It will be appreciated by those skilled in the art that the design of the expression vector 

can depend on such factors as the choice of the host cell to be transformed, the level of expression 

of protein desired, etc. The expression vectors of the invention can be introduced into host cells to 

thereby produce proteins or peptides, including fusion proteins or peptides, encoded by nucleic 

acids as described herein (e.g., THAP-family proteins, mutant forms of THAP-family proteins, 

fusion proteins, or fragments of any of the preceding proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression of a 
THAP family or THAP domain polypeptide, or a biologically active fragment or homologue thereof 
in prokaryotic or eukaryotic cells. For example, THAP-family or THAP domain proteins can be 
expressed in bacterial cells such as E. coli, insect cells (using baculovirus expression vectors) yeast 
cells, or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). 
Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for 
example using T7 promoter regulatory sequences and T7 polymerase. 

Expression of proteins in prokaryotes is most often carried out in E. coli with vectors 
containing constitutive or inducible promoters directing the expression of either fusion or non- 
fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, usually to 
the amino terminus of the recombinant protein. Such fusion vectors typically serve three 
purposes: 1) to increase expression of recombinant protein; 2) to increase the solubility of the 
recombinant protein; and 3) to aid in the purification of the recombinant protein by acting as a 
ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is 
introduced at the junction of the fusion moiety and the recombinant protein to enable separation of 
the recombinant protein from the fusion moiety subsequent to purification of the fusion protein. 
Such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin and 
enterokinase. Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith, D. 
B. and Johnson, K. S. (1988) Gene 67:31-40), pMAL (New England Biolabs, Beverly, Mass.) and 
pRIT5 (Pharmacia, Piscataway, N.J.), which fuse glutathione S-transferase (GST), maltose E 
binding protein, or protein A, respectively, to the target recombinant protein. 
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Purified fusion proteins can be utilized in THAP-family activity assays, (e.g., direct assays 

or competitive assays described in detail below), or to generate antibodies specific for THAP- 

family or THAP domain proteins, for example. In a preferred embodiment, a THAP-family or 

THAP domain fusion protein expressed in a retroviral expression vector of the present invention 

can be utilized to infect bone marrow cells which are subsequently transplanted into irradiated 

recipients. The pathology of the subject recipient is then examined after sufficient time has passed 

(e.g six (6) weeks). 

Examples of suitable inducible non-fusion E. coli expression vectors include pTrc (Amann 
et al., (1988) Gene 69:301-315) and pET lid (Studier et al., Gene Expression Technology: Methods 
in Enzymology 185, Academic Press, San Diego, Calif. (1990) 60-89). Target gene expression from 
the pTrc vector relies on host RNA polymerase transcription from a hybrid trp-lac fusion promoter. 
Target gene expression from the pET 1 Id vector relies on transcription from a T7 gnlO-lac fusion 
promoter mediated by a coexpressed viral RNA polymerase (T7 gn 1). This viral polymerase is 
supplied by host strains BL21 (DE3) or HMS174(DE3) from a resident prophage harboring a T7 
gnl gene under the transcriptional control of the lacUV 5 promoter. 

One strategy to maximize recombinant protein expression in E. coli is to express the protein 
in a host bacteria with an impaired capacity to proteolytically cleave the recombinant protein 
(Gottesman, S., Gene Expression Technology: Methods in Enzymology 185, Academic Press, San 
Diego, Calif. (1990) 119-128). Another strategy is to alter the nucleic acid sequence of the nucleic 
acid to be inserted into an expression vector so that the individual codons for each amino acid are 
those preferentially utilized in E. coli (Wada et al., (1992) Nucleic Acids Res. 20:21 1 1-2118). Such 
alteration of nucleic acid sequences of the invention can be carried out by standard DNA synthesis 
techniques. 

In another embodiment, the THAP-family expression vector is a yeast expression vector. 
Examples of vectors for expression in yeast S. cerivisae include pYepSec 1 (Baldari, et al., (1987) 
Embo J. 6:229-234), pMFa (Kurjan andHerskowitz, (1982) Cell 30:933-943), pJRY88 (Schultz et 
al., (1987) Gene 54:113-123), pYES2 (Invitrogen Corporation, San Diego, Calif), and picZ 
(InVitrogen Corp, San Diego, Calif.). 

Alternatively, THAP-family or THAP domain proteins can be expressed in insect cells 
using baculovirus expression vectors. Baculovirus vectors available for expression of proteins in 
cultured insect cells (e.g., Sf9 cells) include the pAc series (Smith et al. (1983) Moi. Cell Biol. 
3:2156-2165) and the pVL series (Lucklow and Summers (1989) Virology 170:31-39). In 
particularly preferred embodiments, THAP-family proteins are expressed according to Karniski et 
al, Am. J. Physiol. (1998) 275: F79-87. 

In yet another embodiment, a nucleic acid of the invention is expressed in mammalian cells 
using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 
(Seed, B. (1987) Nature 329:840) and pMT2PC (Kaufman et al. (1987) EMBO J. 6:187-195). When 
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used in mammalian cells, the expression vector's control functions are often provided by viral 

regulatory elements. For example, commonly used promoters are derived from polyoma, 
Adenovirus 2, cytomegalovirus and Simian Virus 40. For other suitable expression systems for both 
prokaryotic and eukaryotic cells see chapters 16 and 17 of Sambrook, J., Fritsh, E. F., and Maniatis, 
T. Molecular Cloning: A Laboratory Manual. 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989. In . another embodiment, the 
recombinant mammalian expression vector is capable of directing expression of the nucleic acid 
preferentially in a particular cell type (e.g., tissue-specific regulatory elements are used to express 
the nucleic acid). Tissue-specific regulatory elements are known in the art, and are further described 
below. 

The invention further provides a recombinant expression vector comprising a DNA 
molecule of the invention cloned into the expression vector in an antisense orientation. That is, the 
DNA molecule is operatively linked to a regulatory sequence in a manner which allows for 
expression (by transcription of the DNA molecule) of an RNA molecule which is antisense to 
THAP-family mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the 
antisense orientation can be chosen which direct the continuous expression of the antisense RNA 
molecule in a variety of cell types, for instance viral promoters and/or enhancers, or regulatory 
sequences can be chosen which direct constitutive, tissue specific or cell type specific expression of 
antisense RNA. The antisense expression vector can be in the form of a recombinant plasmid, 
phagemid or attenuated virus in which antisense nucleic acids are produced under the control of a 
high efficiency regulatory region, the activity of which can be determined by the cell type into 
which the vector is introduced. For a discussion of the regulation of gene expression using antisense 
genes see Weintraub, H. et aL, Antisense RNA as a molecular tool for genetic analysis, Reviews- 
Trends in Genetics, Vol. 1(1) 1986. 

Another aspect of the invention pertains to host cells into which a recombinant expression 
vector of the invention has been introduced. The terms "host cell" and "recombinant host cell" are 
used interchangeably herein. It is understood that such term refer not only to the particular subject 
cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur 
in succeeding generations due to either mutation or environmental influences, such progeny may 
not, in fact, be identical to the parent cell, but are still included within the scope of the term as used 
herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, a THAP-family protein 
can be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as 
Chinese hamster ovary cells (CHO) or COS cells or human cells). Other suitable host cells are 
known to those skilled in the art, including mouse 3T3 cells as further described in the Examples. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfeetion techniques. As used herein, the terms "transformation" and 
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"transfection" are intended to refer to a variety of art-recognized techniques for introducing foreign 

nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co- 
precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation. Suitable 
methods for transforming or transfecting host cells can be found in Sambrook, et al. (Molecular 
Cloning: A Laboratory Manual. 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y., 1989), and other laboratory manuals. 

For stable transfection of mammalian cells, it is known that, depending upon the expression 
vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA 
into their genome. In order to identify and select these integrants, a gene that encodes a selectable 
marker (e.g., resistance to antibiotics) is generally introduced into the host cells along with the gene 
of interest. Preferred selectable markers include those which confer resistance to drugs, such as 
G418, hygromycin and methotrexate. Nucleic acid encoding a selectable marker can be introduced 
into a host cell on the same vector as that encoding a THAP-family protein or can be introduced on 
a separate vector. Cells stably transfected with the introduced nucleic acid can be identified by drug 
selection (e.g., cells that have incorporated the selectable marker gene will survive, while the other 
cells die). 

A host cell of the invention', such as a prokaryotic or eukaryotic host cell in culture, can be 
used to produce (i.e., express) a THAP-family protein. Accordingly, the invention further provides 
methods for producing a THAP-family protein using the host cells of the invention. In one 
embodiment, the method comprises culturing the host cell of invention (into which a recombinant 
expression vector encoding a THAP-family protein has been introduced) in a suitable medium such 
that a THAP-family protein is produced. In another embodiment, the method further comprises 
isolating a THAP-family protein from the medium or the host cell. 

In another embodiment, the invention encompassesa method comprising: providing a cell 
capable of expressing a THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof, culturing said cell in a suitable medium such that a THAP-family 
or THAP domain protein is produced, and isolating or purifying the THAP-family or THAP domain 
protein from the medium or cell. 

The host cells of the invention can also be used to produce nonhuman transgenic animals, 
such as for the study of disorders in which THAP family proteins are implicated. For example, in 
one embodiment, a host cell of the invention is a fertilized oocyte or an embryonic stem cell into 
which THAP-family- or THAP domain- coding sequences have been introduced. Such host cells 
can then be used to create non-human transgenic animals in which exogenous THAP-family or 
THAP domain sequences have been introduced into their genome or homologous recombinant 
animals in which endogenous THAP-family or THAP domain sequences have been altered. Such 
animals are useful for studying the function and/or activity of a THAP-family or THAP domain 
polypeptide or fragment thereof and for identifying and/or evaluating modulators of a THAP-family 
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or THAP domain activity. As used herein, a "transgenic animal" is a non^human animal, preferably 
a mammal, more preferably a rodent such as a tat or mouse, in which one or more of the cells of the 
animal includes a transgene. Other examples of transgenic animals include non-human primates, 
sheep, dogs, cows, goats, chickens, amphibians, etc. A transgene is exogenous DNA which is 
integrated into the genome of a cell from which a transgenic animal develops and which remains in 
the genome of the mature animal, thereby directing the expression of an encoded gene product in 
one or more cell types or tissues of the transgenic animal. As used herein, a "homologous 
recombinant animal" is a non-human animal, preferably a mammal, more preferably a mouse, in 
which an endogenous THAP-family or THAP domain gene has been altered by homologous 
recombination between the endogenous gene and an exogenous DNA molecule introduced into a 
cell of the animal, e.g., an embryonic cell of the animal, prior to development of the animal. 
Methods for generating transgenic animals via embryo manipulation and microinjection, 
particularly animals such as mice, have become conventional in the art and are described, for 
example, in U.S. Pat. Nos. 4,736,866 and 4,870,009, both by Leder et al., U.S. Pat. No. 4,873,191 
by Wagner et al. and in Hogan, B., Manipulating the Mouse Embryo, (Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y., 1986). 
Gene Therapy Vectors 

Prefered vectors for administration to a subject can be constructed according to well known 
methods. Vectors will comprise regulatory elements (e.g. promoter, enhancer, etc) capable of 
directing the expression of the nucleic acid in the targeted cell. Thus, where a human cell is 
targeted, it is preferable to position the nucleic acid coding region adjacent to and under the control 
of a promoter that is capable of being expressed in a human cell. 

In various embodiments, the human cytomegalovirus (CMV) immediate early gene 
promoter, the SV40 early promoter, the Rous sarcoma virus long terminal repeat, P actin, rat insulin 
promoter and glyceraldehyde-3 -phosphate dehydrogenase can be used to obtain high-level 
expression of the coding sequence of interest. The use of other viral or mammalian cellular or 
bacterial phage promoters which are well-known in the art to achieve expression of a coding 
sequence of interest is contemplated as well, provided that the levels of expression are sufficient for 
a given purpose. By employing a promoter with well-known properties, the level and pattern of 
expression of the protein of interest following transfection or transformation can be optimized. 

Selection of a promoter that is regulated in response to specific physiologic or synthetic 
signals can permit inducible expression of the gene product. For example in the case where 
expression of a transgene, or transgenes when a multicistronic vector is utilized, is toxic to the cells 
in which the vector is produced in, it may be desirable to prohibit or reduce expression of one or 
more of the transgenes. Several inducible promoter systems are available for production of viral 
vectors where the transgene product may be toxic. 
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The ecdysone system (Invitrogen, Carlsbad, CA) is one such system. This system is 

designed to allow regulated expression of a gene of interest in mammalian cells. It consists of a 

tightly regulated expression mechanism that allows virtually NO basal level expression of the 

transgene, but over 200-fold inducibility. The system is based on the heterodimeric ecdysone 

receptor of Drosophila, and when ecdysone or an analog such as muristerone A binds to the 

receptor, the receptor activates a promoter to turn on expression of the downstream transgene high 

levels of mRNA transcripts are attained. In this system, both monomers of the heterodimeric 

receptor are constituitively expressed from one vector, whereas the ecdysone-responsive promoter 

which drives expression of the gene of interest is on another plasmid. Engineering of this type of 

system into the gene transfer vector of interest would therefore be useful. Cotransfection of 

plasmids containing the gene of interest and the receptor monomers in the producer cell line would 

then allow for the production of the gene transfer vector without expression of a potentially toxic 

transgene. At the appropriate time, expression of the transgene could be activated with ecdysone or 

muristeron A. Another inducible system that would be useful is the Tet-Off or Tet On system 

(Clontech, Palo Alto, CA) originally developed by Gossen and Bujard (Gossen and Bujard, 1992; 

Gossen et al, 1995). This system also allows high levels of gene expression to be regulated in 

response to tetracycline or tetracycline derivatives such as doxycycline. In the Tet-On system, gene 

expression is turned on in the presence of doxycycline, whereas in the Tet-Off system, gene 

expression is turned on in the absence of doxycycline. These systems are based on two regulatory 

elements derived from the tetracycline resistance operon of E. coli. The tetracycline operator 

sequence to which the tetracycline repressor binds, and the tetracycline repressor protein. The gene 

of interest is cloned into a plasmid behind a promoter that has tetracycline-responsive elements 

present in it. A second plasmid contains a regulatory element called the tetracycline-contr'olled 

transactivator, which is composed, in the Tet Off system, of the VP 16 domain from the herpes 

simplex virus and the wild-type tetracycline repressor. 

Thus in the absence of doxycycline, transcription is constituitively on. In the Tet-OnTm 

system, the tetracycline repressor is not wild-type and in the presence of doxycycline activates 

transcription. For gene therapy vector production, the Tet Off system would be preferable so that 

the producer cells could be grown in the presence of tetracycline or doxycycline and prevent 

expression of a potentially toxic transgene, but when the vector is introduced to the patient, the gene 

expression would be constituitively on. 

In some circumstances, it may be desirable to regulate expression of a transgene in a gene 

therapy vector. For example, different viral promoters with varying strengths of activity may be 

utilized depending on the level of expression desired. In mammalian cells, the CMV immediate 

early promoter if often used to provide strong transcriptional activation. Modified versions of the 

CMV promoter that are less potent have also been used when reduced levels of expression of the 

transgene are desired. When expression of a transgene in hematopoeticcells is desired, retroviral 
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promoters such as the LTRs from MLV or MMTV are often used. Other viral promoters that may 

be used depending on the desired effect include SV40, RSV LTR, HIV-1 and HfV-2 LTR, 
adenovirus promoters such as from the EIA, E2A, or MLP region, AAV LTR, cauliflower mosaic 
virus, HSV-TK, and avian sarcoma virus. 

Similarly tissue specific promoters may be used to effect transcription in specific tissues or 
cells so as to reduce potential toxicity or undesirable effects to non-targeted tissues. For example, 
promoters such as the PSA, probasin, prostatic acid phosphatase or prostate-specific glandular 
kallikrein (hK2) may be used to target gene expression in the prostate. Similarly, promoters as 
follows may be used to target gene expression in other tissues. 

Tissue specific promoters include in (a) pancreas: insulin, elastin, amylase, pdr-I, pdx-I, 
glucokinase; (b) liver: albumin PEPCK, HBV enhancer, alpha fetoprotein, apolipoprotein C, alpha-I 
antitrypsin, vitellogenin, NF-AB, Transthyretin; (c) skeletal muscle: myosin H chain, muscle 
creatine kinase, dystrophin, calpain p94, skeletal alpha-actin, fast troponin 1; (d) skin: keratin K6, 
keratin KI; (e) lung: CFTR, human cytokeratin IS (K 18), pulmonary surfactant proteins A, B and 
C, CC-10, Pi; (f) smooth muscle: sm22 alpha, SM-alpha-actin; (g) endothelium: endothelin- I, E- 
selectin, von Willebrand factor, TIE (Korhonen et al., 1995), KDR/flk-I; (h) melanocytes: 
tyrosinase; (i) adipose tissue: lipoprotein lipase (Zechner et al., 1988), adipsin (Spiegelman et al., 
1989), acetyl-CoA carboxylase (Pape and Kim, 1989), glycerophosphate dehydrogenase (Dani et 
al., 1989), adipocyte P2 (Hunt et al., 1986); and (j) blood: P-globin. 

In certain indications, it may be desirable to activate transcription at specific times after 
administration of the gene therapy vector. This may be done with such promoters as those that are 
hormone or cytokine regulable. For example in gene therapy applications where the indication is 
in a gonadal tissue where specific steroids are produced or routed to, use of androgen or estrogen 
regulated promoters may be advantageous. Such promoters that are hormone regulatable include 
MMTV, MT-1, ecdysone and RuBisco. Other hormone regulated promoters such as those 
responsive to thyroid, pituitary and adrenal hormones are expected to be useful in the present 
invention. Cytokine and inflammatory protein responsive promoters that could be used include K 
and T Kininogen (Kageyama et al., 1987), c-fos, TNF-alpha, C-reactive protein (Arcone et al., 
1988), haptoglobin (Oliviero et al., 1987), serum amyloid A2, C/EBP alpha, IL-1, IL-6 (Poli and 
Cortese, 1989), Complement C3 (Wilson et al., 1990), IL-8, alpha-1 acid glycoprotein (Prowse and 
Baumann, 1988), alpha-1 antitypsin, lipoprotein lipase (Zechner et al., 1988), angiotensinogeri (Ron 
et al., 1991), fibrinogen, c-jun (inducible by phorbol esters, TNF alpha, UV radiation, retinoic acid, 
and hydrogen peroxide), collagenase (induced by phorbol esters and retinoic acid), metallothionein 
(heavy metal and glucocorticoid inducible), Stromelysin (inducible by phorbol ester, interleukin-1 
and EGF), alpha-2 macroglobulin and alpha- 1 antichymotrypsin. 

It is envisioned that cell cycle regulatable promoters may be useful in the present invention. 
For example, in a bi-cistronic gene therapy vector, use of a strong CMV promoter to drive 
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expression of a first gene such as pi 6 that arrests cells in the Gl phase could be followed by 

expression of a second gene such as p53 under the control of a promoter that is active in the Gl 

phase of the cell cycle, thus providing a "second hit" that would push the cell into apoptosis. Other 

promoters such as those of various cyclins, PCNA, galectin-3, E2FI, p53 and BRCAI could be used. 

Tumor specific promoters such as osteocalcin, hypoxia-responsive element (HRE), 
NIAGE-4, CEA, alpha-fetoprotein, GRP78/BiP and tyrosinase also may be used to regulate gene 
expression in tumor cells. Other promoters that could be used according to the present invention 
include Lac-regulatable, chemotherapy inducible (e.g. MDR), and heat (hyperthermia) inducible 
promoters, Radiation-inducible (e.g., EGR (Joki et al, 1995)), Alpha-inhibin, RNA pol III tRNA 
met and other amino acid promoters, Ul snRNA (Bartlett et al., 1996), MC-1, PGK, -actin and 
alpha-globin. Many other promoters that may be useful are listed in Walther and Stein (1996). 

It is envisioned that any of the above promoters alone or in combination with another may 
be useful according to the present invention depending on the action desired. 

In addition, this list of promoters should not be considered to be exhaustive or limiting, 
those of skill in the art will know of other promoters that may be used in conjunction with the 
THAP-family and THAP domain nucleic acids and methods disclosed herein. 

1. Enhancers 

Enhancers are genetic elements that increase transcription from a promoter located at a 
distant position on the same molecule of DNA. Enhancers are organized much like promoters/ That 
is, they are composed of many individual elements, each of which binds to one or more 
transcriptional proteins. The basic distinction between enhancers and promoters is operational. An 
enhancer region as a whole must be able to stimulate transcription at a distance; this need not be 
true of a promoter region or its component elements. On the other hand, a promoter must have one 
or more elements that direct initiation of RNA synthesis at a particular site and in a particular 
orientation, whereas enhancers lack these specificities. Promoters and enhancers are often 
overlapping and contiguous, often seeming to have a very similar modular organization, 

Below is a list of promoters additional to the tissue specific promoters listed above, cellular 
promoters/enhancers and inducible promoters/enhancers that could be used in combination with the 
nucleic acid encoding a gene of interest in an expression construct (list of enhancers, and Table 1). 
Additionally, any promoter/enhancer combination (as per the Eukaryotic Promoter Data Base 
EPDB) could also be used to drive expression of the gene. Eukaryotic cells can support cytoplasmic 
transcription from certain bacterial promoters if the appropriate bacterial polymerase is provided, 
either as part of the delivery complex or as an additional genetic expression construct. 

Suitable enhancers include: Immunoglobulin Heavy Chain; Immunoglobulin Light Chain; 
T-Cell Receptor; HLA DQ (x and DQ beta; beta-Interferon; Interleukin-2; Interleukin-2 Receptor; 
MHC Class II 5; MHC Class II HLA-DRalpha; beta- Actin; Muscle Creatine Kinase; Prealbumin 
(Transthyretin); Elastase I; Metallothionein; Collagenase; Albumin Gene; alpha-Fetoprotein; - 
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Globin; beta-Globin; e-fos; c-HA-ras; Insulin; Neural Cell Adhesion Molecule (NCAM); alpha al- 
Antitrypsin; H2B (TH2B) Histone; Mouse or Type I Collagen; Glucose-Regulated Proteins (GRP94 
and GRP78); Rat Growth Hormone; Human Serum Amyloid A (SAA); Troponin I (TN 1); Platelet- 
Derived Growth Factor; Duchenne Muscular Dystrophy; SV40; Polyoma; Retroviruses; 
THAPilloma Virus; Hepatitis B Virus; Human Immunodeficiency Virus; Cytomegalovirus; and 
Gibbon Ape Leukemia Virus 



TABLE 1 



Element 


inducer 


MT 1 1 


Phorbol Ester (TPA) 


Heavy metals MMTV (mouse mammary tumor 
Glucocorticoids virus) 




B-Interferon 


poly(rI)X; poly(rc) 


Adenovirus 5 E2 


Ela 


c-jun 


Phorbol Ester (TPA), H202 


H202 Collagenase 


Phorbol Ester (TPA) 


Stromelysin 


Phorbol Ester (TPA), IL- 1 


SV40 


Phorbol Ester (TPA) 


Murine MX Gene 


Interferon, Newcastle Disease Virus j 


GRP78 Gene 


A23187 


oc-2 -Macroglobulin 


IL-6 


Vimentm Serum NMC Class I Gene H-2kB 


Interferon 


HSP70 


Ela, S V40 Large T Antigen 


Insulin E Box 


Glucose 


Proliferin 


Phorbol Ester-TPA 


Tumor Necrosis Factor 


FMA 


Thyroid Stimulating Hormone alpha Gene 


Thyroid Hormone 



In preferred embodiments of the invention, the expression construct comprises a virus or 
engineered construct derived from a viral genome. The ability of certain viruses to enter cells via 
receptor-mediated endocytosis and to integrate into host cell genome and express viral genes stably 
and efficiently have made them attractive candidates for the transfer of foreign genes into 
mammalian cells (Ridgeway, 1988; Nicolas and Rubenstein, 1988; Baichwal and Sugden, 1986; 
Temin, 1986). The first viruses used as gene vectors were DNA viruses including the papovaviruses 
(simian virus 40, bovine papilloma virus, and polyoma) (Ridgeway, 1988; Baichwal and Sugden, 
1986) and adenoviruses (Ridgeway, 1988; Baichwal and Sugden, 1986). These have a relatively 
low capacity for foreign DNA sequences and have a restricted host spectrum. 

Furthermore, their oncogenic potential and cytopathic effects in permissive cells raise 
safety concerns. They can accommodate only up to 8 kB of foreign genetic material but can be 
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readily introduced in a variety of cell lines and laboratory animals (Nicolas and Rubenstein, 1988; 

Temin, 1986). 

(iii) Polyadenylation Signals 

Where a cDNA insert is employed, one will typically desire to include a polyadenylation 
signal to effect proper polyadenylation of the gene transcript. The nature of the polyadenylation 
signal is not believed to be crucial to the successful practice of the invention, and any such 
sequence may be employed such as human or bovine growth hormone and SV40 polyadenylation 
signals. Also contemplated as an element of the expression cassette is a terminator. These elements 
can serve to enhance message levels and to minimize read through from the cassette into other 
sequences. 

Antisense Constructs 

The term "antisense nucleic acid" is intended to refer to the oligonucleotides 
complementary to the base sequences of DNA and RNA. Antisense oligonucleotides, when 
introduced into a target cell, specifically bind to their target nucleic acid and interfere with 
transcription, RNA processing, transport and/or translation. Targeting double-stranded (ds) DNA 
with oligonucleotide leads to triple-helix formation; targeting RNA will lead to double-helix 
formation. 

Antisense constructs may be designed to bind to the promoter and other control regions, 
exons, introns or even exon-intron boundaries of a gene. Antisense RNA constructs, or DNA 
encoding such antisense RNAs, may be employed to inhibit gene transcription or translation or both 
within a host cell, either in vitro or in vivo, such as within a host animal, including a human subject. 
Nucleic acid sequences comprising complementary nucleotides" are those which are capable of 
base-pairing according to the standard Watson-Crick complementary rules. That is, that the larger 
purines, will base pair with the smaller pyrimidines to form only combinations of guanine paired 
with cytosine (G:C) and adenine paired with either thymine (A:T), in the case of DNA, or adenine 
paired with uracil (A:U) in the case of RNA. 

As used herein, the terms "complementary" or "antisense sequences" mean nucleic acid 
sequences that are substantially complementary over their entire length and have very few base 
mismatches. For example, micleic acid sequences of fifteen bases in length may be termed 
complementary when they have a complementary nucleotide at thirteen or fourteen positions with 
only single or double mismatches. Naturally, nucleic acid sequences which are "completely 
complementary" will be nuleic acid sequences which are entirely complementary throughout their 
entire length and have NO base mismatches. 

While all or part of the gene sequence may be employed in the context of antisense 
construction, statistically, any sequence 17 bases long should occur only once in the human genome 
and, therefore, suffice to specify a unique target sequence. 
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Although shorter oligomers are easier to make and increase in vivo accessibility, numerous 

other factors are involved in determining the specificity of hybridization. Both binding affinity and 

sequence specificity of an oligonucleotide to its complementary target increases with increasing 

length. It is contemplated that oligonucleotides of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 

more base pairs will be used. One can readily determine whether a given antisense nucleic acid is 

effective at targeting of the corresponding host cell gene simply by testing the constructs in vitro to 

determine whether the endogenous gene's function is affected or whether the expression of related 

genes having complementary sequences is affected. 

In certain embodiments, one may wish to employ antisense constructs which include other 
elements, for example, those which include C-5 propyne pyrimidines. 

Oligonucleotides which contain C-5 propyne analogues of uridine and cytidine have been 
shown to bind RNA with high affinity and to be potent antisense inhibitors of gene expression 
(Wagner etal, 1993). 

Ribozyme Constructs 

As an alternative to targeted antisense delivery, targeted ribozymes may be used. The term 
"ribozyme" refers to an RNA-based enzyme capable of targeting and cleaving particular base 
sequences in oncogene DNA and RNA. Ribozymes either can be targeted directly to cells, in the 
form of RNA oligo-nucleotides incorporating ribozyme sequences, or introduced into the cell as an 
expression construct encoding the desired ribozymal RNA. Ribozymes may be used and applied in 
much the same way as described for antisense nucleic acids. 

Methods of Gene Trans fer 

In order to mediate the effect of transgene expression in a cell, it will be necessary to 
transfer the therapeutic expression constructs of the present invention into a cell. This section 
provides a discussion of methods and compositions of viral production and viral gene transfer, as 
well as non- viral gene transfer methods. 

(i) Viral Vector-Mediated Transfer 

The THAP-family gene is incorporated into a viral infectious particle to mediate gene 
transfer to a cell. Additional expression constructs encoding other therapeutic agents as described 
herein may also be transferred via viral transduction using infectious viral particles, for example, by 
transformation with an adenovirus vector of the present invention as described herein below. 
Alternatively, retroviral or bovine papilloma virus may be employed, both of which permit 
permanent transformation of a host cell with a gene(s) of interest. Thus, in one example, viral 
infection of cells is used in order to deliver therapeutically significant genes to a cell. Typically, the 
virus simply will be exposed to the appropriate host cell under physiologic conditions, permitting 
uptake of the virus. Though adenovirus is exemplified, the present methods may be advantageously 
employed with other viral or non-viral vectors, as discussed below. 



-91- 



WO 03/051917 PCT/EP02/14027 
2. Adenovirus 

Adenovirus is particularly suitable for use as a gene transfer vector because of its mid-sized 
DNA genome, ease of manipulation, high titer, wide target-cell range, and high infectivity. The 
roughly 36 kB viral genome is bounded by 100-200 base pair (bp) inverted terminal repeats (ITR), 
in which are contained cis acting elements necessary for viral DNA replication and packaging. The 
early (E) and late (L) regions of the genome that contain different transcription units are divided by 
the onset of viral DNA replication. 

The El region (ELA and EIB) encodes proteins responsible for the regulation of 
transcription of the viral genome and a few cellular genes. The expression of the E2 region (E2 A 
and E2B) results in the synthesis of the proteins for viral DNA replication. 

These proteins are involved in DNA replication, late gene expression, and host cell shut off 
(Renan, 1990). The products of the late genes (L I, L2, U, L4 and L5), including the majority of the 
viral capsid proteins, are expressed only after significant processing of a single primary transcript 
issued by the major late promoter (MLP). The MLP (located at 16.8 map units) is particularly 
efficient during the late phase of infection, and all the mRKAs issued from this promoter possess a 
5' tripartite leader (TL) sequence which makes them preferred mRNAs for translation. 

In order for adenovirus to be optimized for gene therapy, it is necessary to maximize the 
carrying capacity so that large segments of DNA can be included. It also is very desirable to reduce 
the toxicity and immunologic reaction associated with certain adenoviral products. The two goals 
are, to an extent, coterminous in that elimination of adenoviral genes serves both ends. By practice 
of the present invention, it is possible achieve both these goals while retaining the ability to 
manipulate the therapeutic constructs with relative case. 

The large displacement of DNA is possible because the cis elements required for viral DNA 
replication all are localized in the inverted terminal repeats (ITR) (100-200 bp) at either end of the 
linear viral genome. Plasmids containing ITR's can replicate in the presence of a non-defective 
adenovirus (Hay et al., 1984). Therefore, inclusion of these elements in an adenoviral vector should 
permit replication. 

In addition, the packaging signal for viral encapsidation is localized between 194 385 bp 
(0.5-1.1 map units) at the left end of the viral genome (Hearing et al., 1987). This signal mimics the 
protein recognition site in bacteriophage k DNA where a specific sequence close to the left end, but 
outside the cohesive end sequence, mediates the binding to proteins that are required for insertion of 
the DNA into the head structure. El substitution vectors of Ad have demonstrated that a 450 bp (0- 
1.25 map units) fragment at the left end of the viral genome could direct packaging in 293 cells 
(Levrero et al., 1991). 

Previously, it has been shown that certain regions of the adenoviral genome can be 
incorporated into the genome of mammalian cells and the genes encoded thereby expressed. These 
cell lines are capable of supporting the replication of an adenoviral vector that is deficient in the 
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adenoviral function encoded by the cell line. There also have been reports of complementation of 
replication deficient adenoviral vectors by "helping" vectors, e.g., wild-type virus or conditionally 
defective mutants. 

Replication-deficient adenoviral vectors can be complemented, in trans, by helper virus. 
This observation alone does not permit isolation of the replication-deficient vectors, however, since 
the presence of helper virus, needed to provide replicative functions, would contaminate any 
preparation. Thus, an additional element was needed that would add specificity to the replication 
and/or packaging of the replication-deficient vector. That element, as provided for in the present 
invention, derives from the packaging function of adenovirus. 

It has been shown that a packaging signal for adenovirus exists in the left end of the 
conventional adenovirus map (Tibbetts, 1977). Later studies showed that a mutant with a deletion in 
the EIA (194-358 bp) region of the genome grew poorly even in a cell line that complemented the 
early (EIA) function (Hearing and Shenk, 1983). When a compensating adenoviral DNA (0-353 bp) 
was recombined into the right end of the mutant, the virus was packaged normally. Further 
mutational analysis identified a short, repeated, position-dependent element in the left end of the 
Ad5 genome. One copy of the repeat was found to be sufficient for efficient packaging if present at 
either end of the genome, but not when moved towards the interior of the Ad5 DNA molecule 
(Hearing et al., 1987). 

By using mutated versions of the packaging signal, it is possible to create helper viruses 
that are packaged with varying efficiencies. Typically, the mutations are point mutations or 
deletions. When helper viruses with low efficiency packaging are grown in helper cells, the virus is 
packaged, albeit at reduced rates compared to wild-type virus, thereby permitting propagation of the 
helper. When these helper viruses are grown in cells along with virus that contains wild-type 
packaging signals, however, the wild-type packaging signals are recognized preferentially over the 
mutated versions. Given a limiting amount of packaging factor, the virus containing the wild-type 
signals are packaged selectively when compared to the helpers. If the preference is great enough, 
stocks approaching homogeneity should be achieved. 

3. Retrovirus 

The retroviruses are a group of single-stranded RNA viruses characterized by an ability to 
convert their RNA to double-stranded DNA in infected cells by a process of reverse-transcription 
(Coffin, 1990). The resulting DNA then stably integrates into cellular chromosomes as a provirus 
and directs synthesis of viral proteins. 

The integration results in the retention of the viral gene sequences in the recipient cell and 
its descendants. The retroviral genome contains three genes - gag, pol and env - that code for capsid 
proteins, polymerase enzyme, and envelope components, respectively. A sequence found upstream 
from the gag gene, termed T, functions as a signal for packaging of the genome into virions. Two 
long terminal repeat (LTR) sequences are present at the 5' and 3' ends of the viral genome. These 
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contain strong promoter and enhancer sequences and also are required for integration in the host 
cell genome (Coffin, 1990). 

In order to construct a retroviral vector, a nucleic acid encoding a promoter is inserted into 
the viral genome in the place of certain viral sequences to produce a virus that is replication- 
defective. In order to produce virions, a packaging cell line containing the gag, pol and env genes 
but without the LTR and T components is constructed (Mann et al., 1983). When a recombinant 
plasmid containing a human cDNA, together with the retroviral LTR and T sequences is introduced 
into this cell line (by calcium phosphate precipitation for example), the T sequence allows the RNA 
transcript of the recombinant plasmid to be packaged into viral particles, which are then secreted 
into the culture media (Nicolas and Rubenstein, 1988; Temin, 1986; Mann et al., 19S3). The media 
containing the recombinant retroviruses is collected, optionally concentrated, and used for gene 
transfer. Retroviral vectors are able to infect a broad variety of cell types. However, integration and 
stable expression of many types of retroviruses require the division of host cells (Paskind et al., 
1975). 

An approach designed to allow specific targeting of retrovirus vectors recently was 
developed based on the chemical modification of a retrovirus by the chemical addition of galactose 
residues to the viral envelope. This modification could permit the specific infection of cells such as 
hepatocytes via asialoglycoprotein receptors, should this be desired. 

A different approach to targeting of recombinant retroviruses was designed in which 
biotinylated antibodies against a retroviral envelope protein and against a specific cell receptor were 
used. The antibodies were coupled via the biotin components by using streptavidin (Roux et al., 
1989). Using antibodies against major histocompatibility complex class I and class II antigens, the 
infection of a variety of human cells that bore those surface antigens was demonstrated with an 
ecotropic virus in vitro (Roux et al., 1989). 

4. Adeno-associated Virus 

AAV utilizes a linear, single-stranded DNA of about 4700 base pairs. Inverted terminal 
repeats flank the genome. Two genes are present within the genome, giving rise to a number of 
distinct gene products. The first, the cap gene, produces three different virion proteins (VP), 
designated VP-1, VP 2 and VP-3. 

The second, the rep gene, encodes four non-structural proteins (NS). One or more of these 
rep gene products is responsible for transactivating AAV transcription. 

The three promoters in AAV are designated by their location, in map units, in the genome. 
These are, from left to right, p5, pi 9 and p40. Transcription gives rise to six transcripts, two 
initiated at each of three promoters, with one of each pair being spliced. 

The splice site, derived from map units 42-46, is the same for each transcript. The four non- 
structural proteins apparently are derived from the longer of the transcripts, and three virion 
proteins all arise from the smallest transcript. 
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AAV is not associated with any pathologic state in humans. Interestingly, for efficient 

replication, AAV requires "helping" functions from viruses such as herpes simplex virus I and II, 

cytomegalovirus, pseudorabies virus and, of course, adenovirus. 

The best characterized of the helpers is adenovirus, and many "early" functions for this 

virus have been shown to assist with AAV replication. Low level expression of AAV rep proteins is 

believed to hold AAV structural expression in check, and helper virus infection is thought to 

remove this block. 

The terminal repeats of the AAV vector can be obtained by restriction endonuclease 
digestion of AAV or a plasmid such as p201, which contains a modified AAV genome (Samulski et 
al, 1987), or by other methods known to the skilled artisan, including but not limited to chemical or 
enzymatic synthesis of the terminal repeats based upon the published sequence of AAV. The 
ordinarily skilled artisan can determine, by well-known methods such as deletion analysis, the 
minimum sequence or part of the AAV ITRs which is required to allow function, i.e., stable and site 
specific integration. 

The ordinarily skilled artisan also can determine which minor modifications of the 
sequence can be tolerated while maintaining the ability of the terminal repeats to direct stable, site- 
specific integration. 

AAV-based vectors have proven to be safe and effective vehicles for gene delivery in vitro, 
and these vectors are being developed and tested in pre-clinical and clinical stages for a wide range 
of applications in potential gene therapy, both ex vivo and in vivo (Carter and Flotte, 1996; 
Chattedee et al., 1995; Ferrari et al., 1996; Fisher et al., 1996; Flotte et al., 1993; Goodman et al., 
1994; Kaplitt et al., 1994; 1996, Kessler et al., 1996; Koeberl et al., 1997; Mizukami et al., 1996; 
Xiao etaL, 1996). 

AAV-mediated efficient gene transfer and expression in the lung has led to clinical trials 
for the treatment of cystic fibrosis (Carter and Flotte, 1996; Flotte et al., 1993). Similarly, the 
prospects for treatment of muscular dystrophy by AAV-mediated gene delivery of the dystrophin 
gene to skeletal muscle, of Parkinson's disease by tyrosine hydroxylase gene delivery to the brain, 
of hemophilia B by Factor IX gene delivery to the liver, and potentially of myocardial infarction by 
vascular endothelial growth factor gene to the heart, appear promising since AAV-mediated 
transgene expression in these organs has recently been shown to be highly efficient (Fisher et al., 
1996; Flotte et al., 1993; Kaplitt et al., 1994; 1996; Koeberl et al., 1997; McCown et al., 1996; Ping 
et al, 1996; and Xiao et al., 1996). 

5. Other Viral Vectors 

Other viral vectors may be employed as expression constructs in the present invention. 
Vectors derived from viruses such as vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 1986; 
Coupar et al., 1988) and hepatitus B viruses have also been developed and are useful in the present 
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invention. They offer several attractive features for various mammalian cells (Friedmann, 1989; 
Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al., 1988; and Horwich et al., 1990). 

With the recent recognition of defective hepatitis B viruses, new insight was gained into the 
structure-function relationship of different viral sequences. In vitro studies showed that the virus 
could retain the ability for helper dependent packaging and reverse transcription despite the deletion 
of up to 80% of its genome (Horwich et al., 1990). This suggested that large portions of the genome 
could be replaced with foreign genetic material. Chang et al., recently introduced the 
chloramphenicol acetyltransferase (CAT) gene into duck hepatitis B virus genome in the place of 
the polymerase, surface, and pre-surface coding sequences. It was cotransfected with wild-type 
virus into an avian hepatoma cell line. Culture media containing high titers of the recombinant virus 
were used to infect primary duckling hepatocytes. Stable CAT gene expression was detected for at 
least 24 days after transfection (Chang et al., 1991). 

In still further embodiments of the present invention, the nucleic acids to be delivered are 
housed within an infective virus that has been engineered to express a specific binding ligand. The 
virus particle will thus bind specifically to the cognate receptors of the target cell and deliver the 
contents to the cell. A novel approach designed to allow specific targeting of retrovirus vectors was 
recently developed based on the chemical modification of a retrovirus by the chemical addition of 
lactose residues to the viral envelope. This modification can permit the specific infection of 
hepatocytes via sialoglycoprotein receptors. 

Another approach to targeting of recombinant retroviruses was designed in which 
biotinylated antibodies against a retroviral envelope protein and against a specific cell receptor were 
used. The antibodies were coupled via the biotin components by using streptavidin (Roux et al, 
1989). Using antibodies against major histocompatibility complex class I and class II antigens, they 
demonstrated the infection of a variety of human cells that bore those surface antigens with an 
ecotropic virus in vitro (Roux et al., 1989). 
(it) Non-viral Transfer 

DNA constructs of the present invention are generally delivered to a cell. In certain 
situations, the nucleic acid to be transferred is non-infectious, and can be transferred using non-viral 
methods. 

Several non-viral methods for the transfer of expression constructs into cultured 
mammalian cells are contemplated by the present invention. These include calcium phosphate 
precipitation (Graham and Van Der Eb, 1973; Chen and Okayama, 1987; Rippe et al., 1990) 
DEAE-dextran (Gopal, 1985), electroporation (Tur Kaspa et al., 1986; Potter et al., 1984), direct 
microinjection (Harland and Weintraub, 1985), DNA loaded liposomes (Nicolau and Sene, 1982; 
Fraley et al, 1979), cell sonication (Fechheimer et al., 1987), gene bombardment using high 
velocity microprojectiles (Yang et al., 1990), and receptor-mediated transfection (Wu and Wu, 
1987; Wuand Wu, 1988). 

-96- 



BNSDOCID: <WO 03051 91 7A2_1_> 



WO 03/051917 

PCT/EP02/14027 

Once the construct has been delivered into the cell the nucleic acid encoding the therapeutic 
gene may be positioned and expressed at different sites. In certain embodiments, the nucleic acid 
encodmg the therapeutic gene may be stably integrated into the genome of the cell. Thi-s integration 
may be m the cognate location and orientation via homologous recombination (gene replacement) 
or ,t may be integrated in a random, non-specific location (gene augmentation). In yet further 
embodiments, the nucleic acid may be stably maintained in the cell as a separate, episomal segment 
of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to permit 
maintenance and replication independent of or in synchronization with the host cell cycle. 

How the expression construct is delivered to a cell and where in the cell the nucleic acid 
remains is dependent on the type of expression construct employed. 

In a particular embodiment of the invention, the expression construct may be entrapped in a 
liposome. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane 
and an inner aqueous medium. Multilamellar liposomes have multiple lipid layers separated by 
aqueous medium. They form spontaneously when phospholipids are suspended in an excess of 
aqueous solution. The lipid components undergo self-rearrangement before the formation of closed 
structures and entrap water and dissolved solutes between the lipid bilayers (Ghosh and Bachhawat 
1991). The addition of DNA to cationic liposomes causes a topological transition from liposomes to' 
optically birefnngent liquid-crystalline condensed globules (Radler et al., 1997). These DNA-lipid 
complexes are potential non-viral vectors for use in gene therapy. 

Liposome-mediated nucleic acid delivery and expression of foreign DNA in vitro has been 
very successful. Using the P-lactamase gene, Wong et al. (1980) demonstrated the feasibility of 
liposome-mediated delivery and expression of foreign DNA in cultured chick embryo HeLa, and 
hepatoma cells. Nicolau et al. (1987) accomplished successful liposome-mediated gene transfer in 
rats after intravenous injection. Also included are various commercial approaches involving 
"hpofection" technology. to 

In certain embodiments of the invention, the liposome may be complexed with a 
hemagglutinins virus (HVJ). This has been shown to facilitate fusion with the cell membrane and 
promote cell entry of liposome-encapsulated DNA (Kaneda et al., 1989). 

In other embodiments, the liposome may be complexed or employed in conjunction with 
nuclear nonhistone chromosomal proteins (HMG-1) (Kato et al., 1991). In yet further embodiments 
the hposome may be complexed or employed in conjunction with both HVJ and HMG-1 In that 
such expression constructs have been successfully employed in transfer and expression of nucleic 
acid in vm-o and in vivo, then they are applicable for the present invention. 

Other vector delivery systems which can be employed to deliver a nucleic acid encoding a 
therapeutic gene into cells are receptor-mediated delivery vehicles. These take advantage of the 
selective uptake of macromolecules by receptor mediated endocytosis in almost all eukaryotic cells. 
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Because of the cell type specific distribution of various receptors, the delivery can be highly 
specific (Wu and Wu, 1993). 

Receptor-mediated gene targeting vehicles generally consist of two components: a cell 
receptor-specific ligand and a DNA-binding agent. Several ligands have been used for receptor- 
mediated gene transfer. The most extensively characterized ligands are asialoorosomucoid (ASOR) 
(Wu and Wu, 1987) and transferring (Wagner et al., 1990). 

Recently, a synthetic neoglycoprotein, which recognizes the same receptor as ASOR, has 
been used as a gene delivery vehicle (Ferkol et al., 1993; Perales et al., 1994) and epidermal growth 
factor (EGF) has also been used to deliver genes to squamous carcinoma cells (Myers, EPO 
0273085). 

In other embodiments, the delivery vehicle may comprise a ligand and a liposome. For 
example, Nicolau et al, (1987) employed lactosyl-ceramide, a galactose terminal asialganglioside, 
incorporated into liposomes and observed an increase in the uptake of the insulin gene by 
hepatocytes. Thus, it is feasible that a nucleic acid encoding a therapeutic gene also may be 
specifically delivered into a cell type such as prostate, epithelial or tumor cells, by any number of 
receptor-ligand systems with or without liposomes. For example, the human prostate-specific 
antigen (Watt et al, 1986) may be used as the receptor for mediated delivery of a nucleic acid in 
prostate tissue. 

In another embodiment of the invention, the expression construct may simply consist of 
naked recombinant DNA or plasmids. Transfer of the construct may be performed by any of the 
methods mentioned above which physically or chemically permeabilize the cell membrane. This is 
applicable particularly for transfer in vitro, however, it may be applied for in vivo use as well. 
Dubensky et al, (1984) successfully injected polyornavirus DNA in the form of CaP04 precipitates 
into liver and spleen of adult and newborn mice demonstrating active viral replication and acute 
infection. 

Benvenisty and Neshif (1986) also demonstrated that direct intraperitoneal injection of 
CaP04 precipitated plasmids results in expression of the transfected genes. It is envisioned that 
DNA encoding a CAM may also be transferred in a similar manner in vivo and express CAM. 

Another embodiment of the invention for transferring a naked DNA expression construct 
into cells may involve particle bombardment. This method depends on the ability to accelerate 
DNA coated microprojectiles to a high velocity allowing them to pierce cell membranes and enter 
cells without killing them (Klein et al, 1987). Several devices for accelerating small particles have 
been developed. One such device relies on a high voltage discharge to generate an electrical cur- 
rent, which in turn provides the motive force (Yang et al, 1990). The microprojectiles used have 
consisted of biologically inert substances such as tungsten or gold beads. 
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Antibodies 

Polyclonal anti-THAP-family or anti-THAP domain antibodies can be prepared as 
described above by immunizing a suitable subject with a THAP-family or THAP domain 
immunogen. The anti-THAP-family or anti- THAP domain antibody titer in the immunized subject 
can be monitored overtime by standard techniques, such as with an enzyme linked immunosorbent 
assay (ELISA) using immobilized THAP-family or THAP domain protein. If desired, the antibody 
molecules directed against THAP-family can be isolated from the mammal (e.g., from the blood) 
and further purified by well known techniques, such as protein A chromatography to obtain the IgG 
fraction. At an appropriate time after immunization, e.g., when the anti-THAP-family antibody 
titers are highest, antibody-producing cells can be obtained from the subject and used to prepare 
monoclonal antibodies by standard techniques, such as those described in the following references: 
the hybridoma technique originally described by Kohler and Milstein (1975) Nature 256:495-497) 
(see also, Brown et al. (1981) J. Immunol. 127:539-46; Brown et al. (1980) J. Biol. Chem. 
255:4980-83 ; Yeh et al. (1976) PNAS 76:2927-31; and Yeh et al. (1982) Int. J. Cancer 29:269-75), 
the more recent human B cell hybridoma technique (Kozbor et al. (1983) Immunol Today 4:72), the 
EBV-hybridoma technique (Cole et al. (1985), Monoclonal Antibodies and Cancer Therapy, Alan 
R. Liss, Inc., pp. 77-96) or trioma techniques. The technology for producing monoclonal antibody 
hybridomas is well known (see generally R. H. Kenneth, in Monoclonal Antibodies: A New 
Dimension In Biological Analyses, Plenum Publishing Corp., New York, N. Y. (1980); E. A. Lerner 
(1981) Yale J. Biol. Med., 54:387-402; M L. Getter et al. (1977) Somatic Cell Genet. 3:231-36). 
Briefly, an immortal cell line (typically a myeloma) is fused to lymphocytes (typically splenocytes) 
from a mammal immunized with a THAP-family immunogen as described above, and the culture 
supernatants of the resulting hybridoma cells are screened to identify a hybridoma producing a 
monoclonal antibody that binds THAP-family. 

Any of the many well known protocols used for fusing lymphocytes and immortalized cell 
lines can be applied for the purpose of generating an anti-THAP-family or anti-THAP domain 
monoclonal antibody (see, e.g., G. Galfre et al. (1977) Nature 266:55052; Getter et al. Somatic Cell 
Genet, cited supra; Lerner, Yale J Biol. Med, cited supra; Kenneth, Monoclonal Antibodies, cited 
supra). Moreover, the ordinarily skilled worker will appreciate that there are many variations of 
such methods which also would be useful. Typically, the immortal cell line (e.g., a myeloma cell 
line) is derived from the same mammalian species as the lymphocytes. For example, murine 
hybridomas can be made by fusing lymphocytes from a mouse immunized with an immunogenic 
preparation of the present invention with an immortalized mouse cell line. Preferred immortal cell 
lines are mouse myeloma cell lines that are sensitive to culture medium containing hypoxanthine, 
aminopterin and thymidine ("HAT medium"). Any of a number of myeloma cell lines can be used 
as a fusion partner according to standard techniques, e.g., the P3-NSl/l-Ag4-l, P3-x63-Ag8.653 or 
Sp2ZO-Agl4 myeloma lines. These myeloma lines are available from ATCC. Typically, HAT- 
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sensitive mouse myeloma cells are fused to mouse splenocytes using polyethylene glycol ("PEG"). 
Hybridoma cells resulting from the fusion are then selected using HAT medium, which kills 
unfused and unproductively fused myeloma cells (unfused splenocytes die after several days 
because they are not transformed). Hybridoma cells producing a monoclonal antibody of the 
invention are detected by screening the hybridoma culture supernatants for antibodies that bind a 
THAP-family or THAP domain protein, e.g., using a standard ELISA assay. 

Alternative to preparing monoclonal antibody-secreting hybridomas, a monoclonal anti- 
THAP-family or anti-THAP domain antibody can be identified and isolated by screening a 
recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with 
THAP-family or THAP domain protein to thereby isolate immunoglobulin library members that 
bind THAP-family or THAP domain proteins. Kits for generating and screening phage display 
libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, 
Catalog No. 27-9400-01; and the Stratagene SurfZAP.TM. Phage Display Kit, Catalog No. 
240612). Additionally, examples of methods and reagents particularly amenable for use in 
generating and screening antibody display library can be found in, for example, Ladner et al. U.S. 
Pat. No. 5,223,409; Kang et al. PCT International Publication No. WO 92/18619; Dower et al. PCT 
International Publication No. WO 91/17271; Winter et al. PCT International Publication WO 
92/20791; Markland et al. PCT International Publication No. WO 92/15679; Breitling et al. PCT 
International Publication WO 93/01288; McCafferty et al. PCT International Publication No. WO 
92/01047; Garrard et al. PCT International Publication No. WO 92/09690; Ladner et al. PCT 
International Publication No. WO 90/02809; Fuchs et al. (1991) Bio/Technology 9:1370-1372; Hay 
et al. (1992) Hum. Antibod. Hybridomas 3:81-85; Huse et al. (1989) Science 246:1275-1281; 
Griffiths et al. (1993) EMBO J 12:725-734; Hawkins et al. (1992) J. Mol. Biol. 226:889-896; 
Clarkson et al. (1991) Nature 352:624-628; Gram et al. (1992) PNAS 89:3576-3580; Garrad et al. 
(1991) Bio/Technology 9:1373-1377; Hoogenboom et al. (1991) Nuc. Acid Res. 19:4133-4137; 
Barbas et al. (1991) PNAS 88:7978-7982; and McCafferty et al. Nature (1990) 348:552-554. 

Additionally, recombinant anti-THAP-family or anti-THAP domain antibodies, such as 
chimeric and humanized monoclonal antibodies, comprising both human and non-human portions, 
which can be made using standard recombinant DNA techniques, are within the scope of the 
invention. Such chimeric and humanized monoclonal antibodies can be produced by recombinant 
DNA techniques known in the art, for example using methods described in Robinson et al. 
International Application No. PCT/US86/02269; Akira, et al. European Patent Application 184,187; 
Taniguchi, M., European Patent Application 171496; Morrison et al. European Patent Application 
173,494; Neuberger et al. PCT International Publication No. WO 86/01533; Cabilly et al. U.S. Pat. 
No. 4,816,567; Cabilly et al. European Patent Application 125,023; Better et al. (1988) Science 
240:1041-1043; Liu et al. (1987) PNAS 84:3439-3443; Liu et al. (1987) J. Immunol 139:3521- 
3526; Sun et al. (1987) PNAS 84:214-218; Nishimura et al. (1987) Cane. Res. 47:999-1005; Wood 
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et al. (1985) Nature 314:446-449; and Shaw et al. (1988) J. Natl. Cancer Inst. 80:1553-1559)- 
Morrison, S. L. (1985) Science 229:1202-1207; Oi et al. (19S6) BioTechniques 4:214; Winter U.S. 
Pat. No. 5,225,539; Jones et al. (1986) Nature 321:552-525; Verhoeyan et al. (1988) Science 
239:1534; and Beidler et al. (1988) J. Immunol. 141:4053-4060. 

An anti-THAP-family of anti-THAP domain antibody (e.g., monoclonal antibody) can be 
used to isolate THAP-family or THAP domain protein by standard techniques, such as affinity 
chromatography or immunoprecipitation. For example, an anti-THAP-family antibody can facilitate 
the purification of natural THAP-family from cells and of recombinantly produced THAP-family 
expressed in host cells. Moreover, an anti-THAP-family antibody can be used to detect THAP- 
family protein (e.g., in a cellular lysate or cell supernatant) in order to evaluate the abundance and 
pattern of expression of the THAP-family protein. Anti-THAP-family antibodies can be used 
diagnostically to monitor protein levels in tissue as part of a clinical testing procedure, e.g., to, for 
example, determine the efficacy of a given treatment regimen. Detection can be facilitated by 
coupling (i.e., physically linking) the antibody to a detectable substance. Examples of detectable 
substances include various enzymes, prosthetic groups, fluorescent materials, luminescent 
materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes 
include horseradish peroxidase, alkaline phosphatase, -galactosidase, or acetylcholinesterase; 
examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin! 
examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein 
isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an 
example of a luminescent material includes Iuminol; examples of bioluminescent materials include 
luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 125 r 131 
I, 35 S or 3 h. 

DRUG SCREENING ASSAYS 

The invention provides a method (also referred to herein as a "screening assay") for 
identifying modulators, i.e., candidate or test compounds or agents (e.g., preferably small 
molecules, but also peptides, peptidomimetics or other drugs) which bind to THAP-family or THAP 
domain proteins, have an inhibitory or activating effect on, for example, THAP-family expression 
or preferably THAP-family activity, or have an inhibitory or activating effect on, for example, the 
activity of an THAP-family target molecule. In some embodiments small molecules can be 
generated using combinatorial chemistry or can be obtained from a natural products library. Assays 
may be cell based, non-cell-based or in vivo assays. Drug screening assays may be binding assays 
or more preferentially functional assays, as further described. 

In general, any suitable activity of a THAP-family protein can be detected in a drug 
screening assay, including: (1) mediating apoptosis or cell proliferation when expressed or 
introduced into a cell, most preferably inducing or enhancing apoptosis, and/or most preferably 
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reducing cell proliferation; (2) mediating apoptosis or cell proliferation of an endothelial cell; (3) 
mediating apoptosis or cell proliferation of a hyperproliferative cell; (4) mediating apoptosis or cell 
proliferation of a CNS cell, preferably a neuronal or glial cell; (5) an activity indicative of a 
biological function in an animal selected from the group consisting of mediating, preferably 
inhibiting angiogenesis, mediating, preferably inhibiting inflammation, inhibition of metastatic 
potential of cancerous tissue, reduction of tumor burden, increase in sensitivity to chemotherapy or 
radiotherapy, killing a cancer cell, inhibition of the growth of a cancer cell, or induction of tumor 
regression; or (6) interaction with a THAP family target molecule or THAP domain target 
molecule, preferably interaction with a protein or a nucleic acid. 

The invention also provides a method (also referred to herein as a "screening assay") for 
identifying modulators, i.e., candidate or test compounds or agents (e.g., preferably small 
molecules, but also peptides, peptidomimetics or other drugs) which bind to THAP1, PAR4 or 
PML-NB proteins, and have an inhibitory or activating effect on PAR4 or THAP1 recruitment or 
binding to or association with PML-NBs or interaction, such as binding, of SLC with a THAP- 
family polypeptide or a cellular response to SLC which is mediated by a THAP-family polypeptide. 

In one embodiment, the invention provides assays for screening candidate or test 
compounds which are target molecules of a THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof. In another embodiment, the invention provides 
assays for screening candidate or test compounds which bind to or modulate the activity of a THAP 
family or THAP domain polypeptide, or a biologically active fragment or homologue thereof. The 
test compounds of the present invention can be obtained using any of the numerous approaches in 
combinatorial library methods known in the art, including: biological libraries; spatially addressable 
parallel solid phase or solution phase libraries; synthetic library methods requiring deconvolution; 
the ' one-bead one-compound^ library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is used with peptide libraries, while the 
other four approaches are applicable to peptide, non-peptide oligomer or small molecule libraries of 
compounds (Lam, K. S. (1997) Anticancer Drug Des. 12:145). 

Examples of methods for the synthesis of molecular libraries can be found in the art, for 
example in: DeWitt et al. (1993) Proc. Natl. Acad. Sci. U.S.A. 90:6909; Erb et al. (1 994) Proc. Natl. 
Acad. ScL USA 91:11422; Zuckermann et al. (1994). J. Med. Chem. 37:2678; Cho et al. (1993) 
Science 261:1303; Can-ell et al. (1994) Angew. Chem. Int. Ed. Engl. 33:2059; Carell et al. (199*) 
Angew. Chem. Int. Ed. Engl. 33:2061; and in Gallop et al. (1994) J. Med. Chem. 37:1233. 

Libraries of compounds may be presented in solution (e.g., Houghten (1992) Biotechniques 
13:412-421), or on beads (Lam (1991) Nature 354:82-84), chips (Fodor (1993) Nature 364:555- 
556), bacteria (Ladner U.S. Pat. No. 5,223,409), spores (Ladner U.S. Pat. No. '409), plasmids (Cull 
et al. (1992) Proc Natl Acad Sci USA 89:1865-1869) or on phage (Scott and Smith (1990) Science 

-102- 



BNSDOCID: <WO 03051 91 7A2_I_> 



WO 03/051917 PCT/EP02/14027 

249:386-390); (Devin (1990) Science 249:404-406); (Cwirla et al. (1990) Proc. Natl Acad. Sci. 
87:6378-6382); (Felici (1991) J. Mol. Biol 222:301-310); (Ladner supra.). 

Determining the ability of the test compound to inhibit or increase THAP-family 
polypeptide activity can also be accomplished, for example, by coupling the THAP family or THAP 
domain polypeptide, or a biologically active fragment or homologue thereof with a radioisotope or 
enzymatic label such that binding of the THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof to its cognate target molecule can be determined 
by detecting the labeled THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof in a complex. For example, compounds (e.g., THAP family or 
THAP domain polypeptide, or a biologically active fragment or homologue thereof) can oe labeled 
with 125 I, 33 S, 14 C, or 3 H, either directly or indirectly, and the radioisotope detected by direct 
counting of radioemmission or by scintillation counting. Alternatively, compounds can be 
enzymatically labeled with, for example, horseradish peroxidase, alkaline phosphatase, or 
luciferase, and the enzymatic label detected by determination of conversion of an appropriate 
substrate to product. The labeled molecule is placed in contact with its cognate molecule and the 
extent of complex formation is measured. For example, the extent of complex formation may be 
measured by immuno precipitating the complex or by performing gel electrophoresis. 

It is also within the scope of this invention to determine the ability of a compound (e.g., 
THAP family or THAP domain polypeptide, or biologically active fragment or homologue thereof) 
to interact with its cognate target molecule without the labeling of any of the interactants. For 
example, a microphysiometer can be used to detect the interaction of a compound with its cognate 
target molecule without the labeling of either the compound or the target molecule. McConnell, H. 
M. et al. (1992) Science 257:1906-1912. A microphysiometer such as a cytosensor is an analytical 
instrument that measures the rate at which a cell acidifies its environment using a light-addressable 
potentiometric sensor (LAPS). Changes in this acidification rate can be used as an indicator of the 
interaction between compound and cognate target molecule. 

^ In a preferred embodiment, the assay comprises contacting a cell which expresses a THAP 
family or THAP domain polypeptide, or biologically active fragment or homologue thereof, with a 
THAP-family or THAP domain protein target molecule to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the ability of the test compound to inhibit or 
increase the activity of the THAP family or THAP domain polypeptide, or biologically active 
fragment or homologue thereof, wherein determining the ability of the test compound to inhibit or 
increase the activity of the THAP family or THAP domain polypeptide, or biologically active 
fragment or homologue thereof, comprises determining the ability of the test compound to inhibit or 
increase a biological activity of the THAP-family polypeptide expressing cell. 

In another embodiment, the assay comprises contacting a cell which expresses a THAP 
family or THAP domain polypeptide, or biologically active fragment or homologue thereof, with a 
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test compound, and determining the ability of the test compound to inhibit or increase the activity of 
the THAP family or THAP domain polypeptide, or biologically active fragment or homologue 
thereof, wherein determining the ability of the test compound to inhibit or increase the activity of 
the THAP family or THAP domain polypeptide, or biologically active fragment or homologue 
thereof, comprises determining the ability of the test compound to inhibit or increase a biological 
activity of the THAP-family polypeptide expressing cell. 

In another preferred embodiment, the assay comprises contacting a cell which is responsive 
to a THAP family or THAP domain polypeptide, or a biologically active fragment or homologue 
thereof, with a THAP-family protein or biologically-active portion thereof, to form an assay 
mixture, contacting the assay mixture with a test compound, and determining the ability of the test 
compound to modulate the activity of the THAP-family protein or biologically active portion 
thereof, wherein determining the ability of the test compound to modulate the activity of the THAP- 
family protein or biologically active portion thereof comprises determining the ability of the test 
compound to modulate a biological activity of the THAP-family polypeptide-responsive cell (e.g., 
determining the ability of the test compound to modulate a THAP-family polypeptide activity. 

In another embodiment, an assay is a cell-based assay comprising contacting a cell 
expressing a THAP-family target molecule (i.e. a molecule with which THAP-family polypeptide 
interacts) with a test compound and determining the ability of the test compound to modulate (e.g. 
stimulate or inhibit) the activity of the THAP-family target molecule. Determining the ability of the 
test compound to modulate the activity of a THAP-family target molecule can be accomplished, for 
example, by determining the ability of the THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof to bind to or interact with the THAP-family 
target molecule. 

Determining the ability of the THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof to bind to or interact with a THAP-family target molecule 
can be accomplished by one of the methods described above for determining direct binding. In a 
preferred embodiment, determining the ability of the THAP family or THAP domain polypeptide, 
or a biologically active fragment or homologue thereof to bind to or interact with a THAP-family 
target molecule can be accomplished by determining the activity of the target molecule. For 
example, the activity of the target molecule can be determined by contacting the target molecule 
with the THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof and measuring induction of a cellular second messenger of the target (i.e. 
intracellular Ca 2 +, diacylglycerol, IP 3i etc.), detecting catalytic/enzymatic activity of the target an 
appropriate substrate, detecting the induction of a reporter gene (comprising a target-responsive 
regulatory element operatively linked to a nucleic acid encoding a detectable marker, e.g., 
luciferase), of detecting a target-regulated cellular response, for example, signal transduction or 
protein:protein interactions. 
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In yet another embodiment, an assay of the present invention is a cell-free assay in which a 
THAP family or TRAP domain polypeptide, or a biologically active fragment or homologue thereof 
is contacted with a test compound and the ability of the test compound to bind to the THAP family 
or THAP domain polypeptide, or a biologically active fragment or homologue thereof is 
determined. Binding of the test compound to the THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof can be determined either directly or indirectly as 
described above. In a preferred embodiment, the assay includes contacting the THAP family or 
THAP domain polypeptide, or a biologically active fragment or homologue thereof with a known 
compound which binds THAP-family polypeptide (e.g., a THAP-family target molecule) to form an 
assay mixture, contacting the assay mixture with a test compound, and determining the ability of the 
test compound to interact with a THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof, wherein determining the ability of the test compound to 
interact with a THAP-family protein comprises determining the ability of the test compound to 
preferentially bind to THAP family or THAP domain polypeptide, or a biologically active fragment 
or homologue thereof as compared to the known compound. 

In another embodiment, the assay is a cell-free assay in which a THAP family or THAP 
domain polypeptide, or a biologically active fragment or homologue thereof is contacted with a test 
compound and the ability of the test compound to modulate (e.g., stimulate or inhibit) the activity of 
the THAP family or THAP domain polypeptide, or a biologically active fragment or homologue 
thereof is determined. Determining the ability of the test compound to modulate the activity of a 
THAP-family protein can be accomplished, for example, by determining the ability of the THAP 
family or THAP domain polypeptide, or a biologically active fragment or homologue thereof to 
bind to a THAP-family target molecule by one of the methods described above for determining 
direct binding. Determining the ability of the THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof to bind to a THAP-family target molecule can 
also be accomplished using a technology such as real-time Biomolecular Interaction Analysis 
(BIA). Sjolander, S. and Urbaniczky, C. (1991) Anal. Chem. 63:2338-2345 and Szabo et al. (1995) 
Curr. Opin. Struct. Biol. 5:699-705. As used herein, "BIA" is a technology for studying biospecific 
interactions in real time, without labeling any of the interactants (e.g., BIAcore). Changes in the 
optical phenomenon of surface plasmon resonance (SPR) can be used as an indication of real-time 
reactions between biological molecules. 

In an alternative embodiment, determining the ability of the test compound to modulate the 
activity of a THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof can be accomplished by determining the ability of the THAP family or THAP 
domain polypeptide, or a biologically active fragment or homologue thereof to further modulate the 
activity of a downstream effector (e.g., a growth factor mediated signal transduction pathway 
component) of a THAP-family target molecule. For example, the activity of the effector molecule 
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on an appropriate target can be determined or the binding of the effector to an appropriate target can 
be determined as previously described. 

In yet another embodiment, the cell-free assay involves contacting a THAP family or 
THAP domain polypeptide, or a biologically active fragment or homologue thereof with a known 
compound which binds the THAP-family protein to form an assay mixture, contacting the assay 
mixture with a test compound, and determining the ability of the test compound to interact with the 
THAP-family protein, wherein determining the ability of the test compound to interact with the 
THAP-family protein comprises determining the ability of the THAP family or THAP domain 
polypeptide, or a biologically active fragment or homologue thereof to preferentially bind to or 
modulate the activity of a THAP-family target molecule. 

The cell-free assays of the present invention are amenable to use of both soluble and/or 
membrane-bound forms of isolated proteins (e.g. THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof or molecules to which THAP-family targets 
bind). In the case of cell-free assays in which a membrane-bound form an isolated protein is used it 
may be desirable to utilize a solubilizing agent such that the membrane-bound form of the isolated 
protein is maintained in solution. Examples of such solubilizing agents include non-ionic detergents 
such as n-octylglucoside, n-dodecylglucoside, n-dodecylmaltoside, octanoyl-N-methylglucamide, 
decanoyl-N-methylglucamide, Triton. [RTM X-100, Triton.RTM X-l 14, ThesiLRIM], 
Isotridecypoly(ethylene glycol ether) n ,3-[(3-cholamidopropyl)dimethylamminio]" 1 -propane 
sulfonate (CHAPS), 3-[(3-cholamidopropyl)dimethylamminio]-2-hydroxy-l-propane sulfonate 
(CHAPSO), or N-dodecyl=N,N-dimethyl-3-ammonio-l -propane sulfonate. 

In more than one embodiment of the above assay methods of the present invention, it may 
be desirable to immobilize either THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof or a target molecule thereof to facilitate separation of 
complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate 
automation of the assay. Binding of a test compound to a THAP family or THAP domain 
polypeptide, or a biologically active fragment or homologue thereof, or interaction of a THAP- 
family protein with a target molecule in the presence and absence of a candidate compound, can be 
accomplished in any vessel suitable for containing the reactants. Examples of such vessels include 
microtitre plates, test tubes, and micro-centrifuge tubes. In one embodiment, a fusion protein can be 
provided which adds a domain that allows one or both of the proteins to be bound to a matrix. For 
example, glutathione-S-transferase/THAP-family fusion proteins or glutathione-S-transferase/target 
fusion proteins can be adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, Mo.) 
or glutathione derivatized microtitre plates, which are then combined with the test compound or tiie 
test compound and either the non-adsorbed target protein or THAP-family protein, and the mixture 
incubated under conditions conducive to complex formation (e.g., at physiological conditions for 
salt and pH). Following incubation, the beads or microtitre plate wells are washed to remove any 
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unbound components, the matrix immobilized in the case of beads, complex determined either 
directly or indirectly, for example, as described above. Alternatively, the complexes can be 
dissociated from the matrix, and the level of THAP-family polypeptide binding or activity 
determined using standard techniques. 

Other techniques for immobilizing proteins on matrices can also be used in the screening 
assays of the invention. For example, either a THAP-family protein or a THAP-family target 
molecule can be immobilized utilizing conjugation of biotin and streptavidin. Biotinylated THAP- 
family protein or target molecules can be prepared from biotin-NHS (N-hydroxy-succinimide) 
using techniques well known in the art (e.g., biotinylation kit, Pierce Chemicals, Rockford, 111.), and 
immobilized in the wells of streptavidin-coated 96 well plates (Pierce Chemical). Alternatively, 
antibodies reactive with a THAP-family protein or target molecule but which do not interfere with 
binding of the THAP-family protein to its target molecule can be derivatized to the wells of the 
plate, and unbound target or THAP-family protein trapped in the wells by antibody conjugation. 
Methods for detecting such complexes, in addition to those described above for the GST- 
immobilized complexes, include immunodetection of complexes using antibodies reactive with the 
THAP-family protein or target molecule, as well as enzyme-linked assays which rely on detecting 
an enzymatic activity associated with the THAP-family protein or target molecule. 

In another embodiment, modulators of THAP-family or THAP domain polypeptides 
expression are identified in a method wherein a cell is contacted with a candidate compound and the 
expression of THAP-family or THAP domain polypeptides mRNA or protein in the cell is 
determined. The level of expression of THAP-family polypeptide mRNA or protein in the presence 
of the candidate compound is compared to the level of expression of THAP-family polypeptide or 
THAP domain mRNA or protein in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of THAP-family polypeptide expression based on 
this comparison. For example, when expression of THAP-family polypeptide or THAP domain 
mRNA or protein is greater (statistically significantly greater) in the presence of the candidate 
compound than in its absence, the candidate compound is identified as a stimulator of THAP-family 
polypeptide or THAP domain mRNA or protein expression. Alternatively, when expression of 
THAP-family polypeptide or THAP domain mRNA or protein is less (statistically significantly 
less) in the presence of the candidate compound than in its absence, the candidate compound is 
identified as an inhibitor of THAP-family polypeptide or THAP domain mRNA or protein 
expression. The level of THAP-family polypeptide or THAP domain mRNA or protein expression 
in the cells can be determined by methods described herein for detecting THAP-family polypeptide 
or THAP domain mRNA or protein. 

In yet another aspect of the invention, the THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof can be used as "bait proteins" in a two-hybrid 
assay or three-hybrid assay using the methods described above for use in THAP-family 
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polypeptide/PAR4 interactions assays, to identify other proteins which bind to or interact with 
THAP-family polypeptide ("THAP-family-binding proteins" or "THAP-family~bp") and are 
involved in THAP-family polypeptide activity. Such THAP-family- or THAP domain-binding 
proteins are also likely to be involved in the propagation of signals by the THAP-family or THAP 
domain proteins or THAP-family or THAP domain proteins targets as, for example, downstream 
elements of a THAP-family polypeptide- or THAP domain-mediated signaling pathway. 
Alternatively, such THAP-family-binding proteins are likely to be THAP-family polypeptides 
inhibitors. 



THAP/DNA BINDING ASSAYS 

In another embodiment of the invention a method is provided for identifying compounds 
which interfere with THAP-family DNA binding activity, comprising the steps of: contacting a 
THAP-family protein or a portion thereof immobilized on a solid support with both a test 
compound and DNA fragments, or contacting a DNA fragment immobilized on a solid support with 
both a test compound and a THAP-family protein. The binding between DNA and the THAP- 
protein or a portion thereof is detected, wherein a decrease in DNA binding when compared to 
DNA binding in the absence of the test compound indicates that the test compound is an inhibitor of 
THAP-family DNA binding activity, and an increase in DNA binding when compared to DNA 
binding in the absence of the test compound indicates that the test compound is an inducer of or 
restores THAP-family DNA binding activity. As discussed further, DNA fragments may be 
selected to be specific THAP-family protein target DNA obtained for example as described in 
Example 20, or may be non-specific THAP-family target DNA. Methods for detecting protein- 
DNA interactios are well known in the art, including most commonly used electrophoretic mobility 
shift assays (EMSAs) or by filter binding (Zabel et al, (1991) J. Biol. Chem., 266:252; and 
Okamoto and Beach, (1994) Embo J. 13: 4816). Other assays are available which are amenable for 
high throughput detection and quantification of specific and nonspecific DNA binding (Amersham, 
N.J.; and Gal S. et al, 6 th Ann. Conf. Soc. Biomol. Screening, 6-9 Sept 2000, Vancouver, B.C.). 

In a first aspect, a screening assay involves identifying compounds which interfere with 
THAP-family DNA binding activity without prior knowledge about specific THAP-family binding 
sequences. For example, a THAP-family protein is contacted with both a test compound and a 
library of oligonucleotides or a sample of DNA fragments not selected based on specific DNA 
sequences. Preferably the THAP-family protein is immobilized on a solid support (such as an array 
or a column). Unbound DNA is separated from DNA which is bound to the THAP-famliy protein, 
and the DNA which is bound to THAP-family protein is detected and can be quantitated by any 
means known in the art. For example, the DNA fragment is labelled with a detectable moiety, such 
as a radioactive moiety, a colorimetric moiety or a fluorescent moiety. Techniques for so labelling 
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DNA are well known in the art. 

The DNA which is bound to the THAP-family protein or a portion thereof is separated from 
unbound DNA by immunoprecipitation with antibodies which are specific for the THAP-family 
protein or a portion thereof. Use of two different monoclonal anti-THAP-family antibodies may 
result in more complete immunoprecipitation than either one alone. The amount of DNA which is 
in the immunoprecipitate can be quantitated by any means known in the art. THAP-family proteins 
or portions thereof which bind to the DNA can also be detected by gel shift assays (Tan, Cell, 
62:367, 1990), nuclease protection assays, or methylase interference assays. 

It is still another object of the invention to provide methods for identifying compounds 
which restore the ability of mutant THAP-family proteins or portions thereof to bind to DNA 
sequences. In one embodiment a method of screening agents for use in therapy is provided 
comprising: measuring the amount of binding of a THAP-family protein or a portion thereof which 
is encoded by a mutant gene found in cells of a patient to DNA molecules, preferably random 
oligonucleotides or DNA fragments from a nucleic acid library; measuring the amount of binding of 
said THAP-family protein or a portion thereof to said nucleic acid molecules in the presence of a 
test substance; and comparing the amount of binding of the THAP-family protein or a portion 
thereof in the presence of said test substance to the amount of binding of the THAP-family protein 
in the absence of said test substance, a test substance which increases the amount of binding being a 
candidate for use in therapy. 

In another embodiment of the invention, oligonucleotides can be isolated which restore to 
mutant THAP-family proteins or portions thereof the ability to bind to a consensus binding 
sequence or conforming sequences. Mutant THAP-family protein or a portion thereof and random 
oligonucleotides are added to a solid support on which THAP-family-specific DNA fragments are 
immobilized. Oligonucleotides which bind to the solid support are recovered and analyzed. Those 
whose binding to the solid support is dependent on the presence of the mutant THAP-family protein 
are presumptively binding the support by binding to and restoring the conformation of the mutant 
protein. 

If desired, specific binding can be distinguished from non-specific binding by any means 
known in the art. For example, specific binding interactions are stronger than non-specific binding 
interactions. Thus the incubation mixture can be subjected to any agent or condition which 
destabilizes protein/DNA interactions such that the specific binding reaction is the predominant one 
detected. Alternatively, as taught more specifically below, a non-specific competitor, such as dl-dC, 
can be added to the incubation mixture. If the DNA containing the specific binding sites is labelled 
and the competitor is unlabeled, then the specific binding reactions will be the ones predominantly 
detected upon measuring labelled DNA. 

According to another embodiment of the invention, after incubation of THAP-family 
protein or a portion thereof with specific DNA fragments all components of the cell lysate which do 
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not bind to the DNA fragments are removed. This can be accomplished, among other ways, by 
employing DNA fragments which are attached to an insoluble polymeric support such as agarose, 
cellulose and the like. After binding, all non-binding components can be washed away, leaving 
THAP-family protein or a portion thereof bound to the DNA/solid support. The THAP-family 
protein or a portion thereof can be quantitated by any means known in the art. It can be determined 
using an immunological assay, such as an ELISA, RIA or Western blotting. 

In another embodiment of the invention a method is provided for identifying compounds 
which specifically bind to THAP-family-specific-DNA sequences, comprising the steps of: 
contacting a THAP-family-specific DNA fragment immobilized on a solid support with both a test 
compound and wild-type THAP-family protein or a portion thereof to bind the wild-type THAP- 
family protein or a portion thereof to the DNA fragment; determining the amount of wild-type 
THAP-family protein which is bound to the DNA fragment, inhibition of binding of wild-type 
THAP-family protein by the test compound with respect to a control lacking the test compound 
suggesting binding of the test compound to the THAP-family-specific DNA binding sequences. 

It is still another object of the invention to provide methods for identifying compounds 
which restore the ability of mutant THAP-family proteins or portions thereof to bind to specific 
DNA binding sequences. In one embodiment a method of screening agents for use in therapy is 
provided comprising: measuring the amount of binding of a THAP-family protein or a portion 
thereof which is encoded by a mutant gene found in cells of a patient to a DNA molecule which 
comprises more than one monomer of a specific THAP-family target nucleotide sequence; 
measuring the amount of binding of said THAP-family protein to said nucleic acid molecule in the 
presence of a test substance; and comparing the amount of binding of the THAP-family protein in 
the presence of said test substance to the amount of binding of the THAP-family protein or a 
portion thereof in the absence of said test substance, a test substance which increases the amount of 
binding being a candidate for use in therapy. 

In another embodiment of the invention a method is provided for screening agents for use 
in therapy comprising: contacting a transfected cell with a test substance, said transfected cell 
containing a THAP-family protein or a portion thereof which is encoded by a mutant gene found in 
cells of a patient and a reporter gene construct comprising a reporter gene which encodes an 
assayable product and a sequence which conforms to a THAP-family DNA binding site, wherein 
said sequence is upstream from and adjacent to said reporter gene; and determining whether the 
amount of expression of said reporter gene is altered by the test substance, a test substance which 
alters the amount of expression of said reporter gene being a candidate for use in therapy. 

In still another embodiment a method of screening agents for use in therapy is provided 
comprising: adding RNA polymerase ribonucleotides and a THAP-family protein or a portion 
thereof to a transcription construct, said transcription construct comprising a reporter gene which 
encodes an assayable product and a sequence which conforms to a THAP-family consensus binding 
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site, said sequence being upstream from and adjacent to said reporter gene, said step of adding 
being effected in the presence and absence of a test substance; determining whether the amount of 
transcription of said reporter gene is altered by the presence of said test substance, a test substance 
which alters the amount of transcription of said reporter gene being a candidate for use in therapy. 

According to the present invention compounds which have THAP-family activity are those 
which specifically complex with a THAP-family-speciflc DNA binding site. Oligonucleotides and 
oligonucleotide containing nucleotide analogs are also contemplated among those compounds 
which are able to complex with a THAP-family-speciflc DNA binding site. 

6. Further assays to modulate THAP-family polypeptide activity 

in vivo 

It will be appreciated that any suitable assay that allows detection of THAP-family 
polypeptide or THAP domain activity can be used. Examples of assays for testing protein 
interaction, nucleic acid binding or modulation of apoptosis in the presence or absence of a test 
compound are further described herein. Thus, the invention encompasses a method of identifying a 
candidate THAP-family polypeptide modulator (e.g. activator or inhibitor), said method 
comprising: 

a) providing a cell comprising a THAP family or THAP domain polypeptide, or a 
biologically active fragment or homolog thereof; 

b) contacting said cell with a test compound; and 

c) determining whether said compound selectively modulates (e.g. activates or inhibits) 
THAP-family polypeptide activity, preferably pro-apoptotic activity, or THAP family or THAP 
domain target binding; wherein a determination that said compound selectively modulates (e.g. 
activates or inhibits) the activity of said polypeptide indicates that said compound is a candidate 
modulator (e.g. activator or inhibitor respectively) of said polypeptide. Preferably, the THAP 
family or THAP domain target is a protein or nucleic acid. 

Preferably the cell is a cell which has been transfected with an recombinant expression 
vector encoding a THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof. 

Several examples of assays for the detection of apoptosis are described herein, in the 
section titled "Apoptosis assays". Several examples of assays for the detection of THAP family or 
THAP domain target interactions are described herein, including assays for detection of protein 
interactions and nucleic acid binding. 

In one example of an assay for apoptosis activity, a high throughput screening assay for 
molecules that abrogate or stimulate THAP-family polypeptide proapoptotic activity is provided 
based on serum-withdrawal induced apoptosis in a 3T3 cell line with tetracycline-regulated 
expression of a THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof. Apoptotic cells can be detected by TUNEL labeling in 96- or 384-wells 
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microplates. A drug screening assay can be carried out along the lines as described in Example 13. 
3T3 cells, which have previously been used to analyze the pro-apoptotic activity of PAR4 (Diaz- 
Meco et al, 1996; Berra et al., 1997), can be transfected with expression vectors encoding a THAP- 
family or THAP domain polypeptide allowing the ectopic expression of THAP-family polypeptide. 
Then, the apoptotic response to serum withdrawal is assayed in the presence of a test compound, 
allowing the identification of test compounds that either enhance or inhibit the ability of THAP- 
family or THAP domain polypeptide to induce apoptosis. Transfected cells are deprived of serum 
and cells with apoptotic nuclei are counted. Apoptotic nuclei can be counted by DAPI staining and 
in situ TUNEL assays. 

B. Further THAP-familv polvpeptide/THAP-target interaction assays 

In exemplary methods THAP/THAP target interaction assays are described in the context 
of THAP 1 and the THAP target Par4. However, it will be appreciated that assays for screening for 
modulators of other THAP family members or THAP domains and other THAP target molecules 
may be carried out by substituting these for THAP1 and Par4 in the methods below. For example, 
in some embodiments, modulators which affect the interaction between a THAP-family polypeptide 
and SLC are identified. 

As demonstrated in Examples 4, 5, 6, and 7 and Figures 3, 4 and 5, the inventors have 
demonstrated using several experimental methods that THAP1 interacts with the pro-apoptotic 
protein Par4. In particular, it has been shown that THAP1 interacts with Par4 wild type (Par4) and 
a Par4 death domain (Par4DD) in a yeast two-hybrid system. Yeast cells were cotransformed with 
BD7-THAP1 and AD7-Par4, AD7, AD7-Par4DD or AD7-Par4) expression vectors. Transformants 
were selected on media lacking histidine and adenine. Identical results were obtained by 
cotransformation of AD7-THAP1 with BD7-Par4, BD7, BD7-Par4DD or BD7-Par4). 

The inventors have also demonstrated in vitro binding of THAP 1 to GST-Par4DD. Par4DD 
was expressed as a GST fusion protein, purified on glutathione sepharose and employed as an 
affinity matrix for binding of in vitro translated 35 S-methionine labeled THAP1. GST served as 
negative control. 

Futhermore, the inventors have shown that THAP1 interacts with both Par4DD and SLC in 
vivo. Myc-Par4DD and GFP-THAP1 expression vectors were cotransfected in primary human 
endothelial cells. Myc-Par4DD was stained with monoclonal anti-myc antibody. Green 
fluorescence, GFP-THAP1; red fluorescence, Par4DD. 

The invention thus encompasses assays for the identification of molecules that modulate 
(stimulate or inhibit) THAP-family polypeptide/PAR4 binding. In preferred embodiments, the 
invention includes assays for the identification of molecules that modulate (stimulate or inhibit) 
THAP1 /PAR4 binding or THAP1/SLC binding. 

Four examples of high throughput screening assays include: 
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1) a two hybrid-based assay in yeast to find drugs that disrupt interaction of the THAP- 
family bait with the PAR4 or SLC as prey 

2) an in vitro interaction assay using recombinant THAP-family polypeptide and PAR4 or 
SLC proteins 

3) a chip-based binding assay using recombinant THAP-family polypeptide and PAR4 or 
SLC proteins 

2) a fluorescence resonance energy transfer (FRET) cell-based assay using THAP-family 
polypeptide and PAR4 or SLC proteins fused with fluorescent proteins 

The invention thus encompasses a method of identifying a candidate THAP-family 
polypeptide/PAR4 or SLC interaction modulator, said method comprising: 

a) providing a THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof and a PAR4 or SLC polypeptide or fragment thereof; 

b) contacting said THAP family or THAP domain polypeptide with a test compound; and 

c) determining whether said compound selectively modulates (e.g. activates or inhibits) 
THAP-family/PAR4 or SLC interaction activity. 

Also envisioned is a method comprising: 

a) providing a cell comprising a THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof and a PAR4 or SLC polypeptide or fragment 
thereof; 

b) contacting said cell with a test compound; and 

c) determining whether said compound selectively modulates (e.g. activates or inhibits) 
THAP-family/PAR4 or SLC interaction activity. 

In general, any suitable assay for the detection of protein-protein interaction may be used. 

In one example, a THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof can be used as a "bait protein" and a PAR4 or SLC protein can be 
used as a "prey protein" (or vice-versa) in a two-hybrid assay (see, e.g., U.S. Pat. No. 5,283,317; 
Zervos et al. (1993) Cell 72:223-232; Madura et al. (1993) J. Biol. Chem. 268:12046-12054; Bartel 
et al. (1993) Biotechniques 14:920-924; Iwabuchi et al. (1993) Oncogene 8:1693-1696; and Brent 
WO94/10300). The two-hybrid system is based on the modular nature of most transcription factors, 
which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 
different DNA constructs. In one construct, the gene that codes for a THAP family or THAP 
domain polypeptide, or a biologically active fragment or homologue thereof -is fused to a gene 
encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the other 
construct, the gene that codes for a THAP family or THAP domain polypeptide, or a biologically 
active fragment or homologue thereof ("prey" or "sample") is fused to a gene that codes for the 
activation domain of the known transcription factor. If the "bait" and the "prey" proteins are able to 
interact, in vivo, forming a THAP-family polypeptide/PAR4 complex, the DNA-binding and 
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activation domains of the transcription factor are brought into close proximity. This proximity 
allows transcription of a reporter gene (e.g., LacZ) which is operably linked to a transcriptional 
regulatory site responsive to the transcription factor. Expression of the reporter gene can be detected 
and cell colonies containing the functional transcription factor can be isolated and used to obtain the 
cloned gene which encodes the protein which interacts with the THAP-family protein. This assay 
can thus be carried out in the presence or absence of a test compound, whereby modulation of 
THAP-family polypeptide/PAR4 or SLC interaction can be detected by lower or lack of 
transcription of the reported gene. 

In other examples, in vitro THAP-family polypeptide/PAR4 or SLC interaction assays can 
be carried out, several examples of which are further described herein. For example, a recombinant 
THAP family or THAP domain polypeptide, or a biologically active fragment or homologue thereof 
is contacted with a recombinant PAR4 or SLC protein or biologically active portion thereof, and the 
ability of the PAR4 or SLC protein to bind to the THAP-family protein is determined. Binding of 
the PAR4 or SLC protein compound to the THAP-family protein can be determined either directly 
or indirectly as described herein. In a preferred embodiment, the assay includes contacting the 
THAP family or THAP domain polypeptide, or a biologically active fragment or homologue thereof 
with a PAR4 or SLC protein which binds a THAP-family protein (e.g., a THAP-family target 
molecule) to form an assay mixture, contacting the assay mixture with a test compound, and 
determining the ability of the test compound to interact with a THAP-family protein, wherein 
determining the ability of the test compound to interact with a THAP-family protein comprises 
determining the ability of the test compound to preferentially bind to THAP-family or biologically 
active portion thereof as compared to the PAR4 or SLC protein. For example, the step of 
determining the ability of the test compound to interact with a THAP-family protein may comprise 
determining the ability of the compound to displace Par4 or SLC from a THAP-family protein/Par4 
or SLC complex thereby forming a THAP-family protein/compound complex. Alternatively, it will 
be appreciated that it is also possible to determine the ability of the test compound to interact with a 
PAR4 or SLC protein, wherein determining the ability of the test compound to interact with a PAR4 
or SLC protein comprises determining the ability of the test compound to preferentially bind to 
PAR4 or SLC or biologically active portion thereof as compared to the THAP-family protein. For 
example, the step of determining the ability of the test compound to interact with a THAP-family 
protein may comprise determining the ability of the compound to displace Par4 or SLC from a 
THAP-family protein/Par4 or SLC complex thereby forming a THAP-family protein/compound 
complex. 

Assays to modulate THAP-family polypeptide and/or Par4 trafficking in the PK4L nuclear 
bodies (PMLNBs) 

As demonstrated in Examples 8 and 9, the inventors have demonstrated using several 
experimental methods that THAP1 and Par4 localize in PMLNBs. 
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The inventors demonstrated that THAP1 is a novel protein associated with PML-nuclear 
bodies. Double immunofluorescence staining showed colocalization of THAP1 with PML-NBs 
proteins, PML and Daxx. Primary human endothelial cells were transfected with GFP-THAP1 
expression vector; endogenous PML and Daxx were stained with monoclonal anti-PML and 
polyclonal anti-Daxx antibodies, respectively. 

The inventors also demonstrated that Par4 is a novel component of PML-NBs that 
colocalizes with THAP1 in vivo by several experiments. In one experiments, double 
immunofluorescence staining revealed colocalization of Par4 and PML at PML-NBs in primary 
human endothelial cells or fibroblasts. Endogenous PAR4 and PML were stained with polyclonal 
anti-PAR4 and monoclonal anti-PML antibodies, respectively. In another experiment, double 
staining revealed colocalization of Par4 and THAP1 in cells expressing ectopic GFP-THAP1. 
Primary human endothelial cells or fibroblasts were transfected with GFP-THAP1 expression 
vector ; endogenous Par4 was stained with polyclonal anti-PAR4 antibodies. 

The inventors further demonstrated that PML recruits the THAP1/Par4 complex to PML- 
NBs. Triple immunofluorescence staining showed colocalization of THAP1, Par4 and PML in cells 
overexpressing PML and absence of colocalization in cells expressing ectopic SplOO. Hela cells 
were cotransfected with GFP-THAP1 and HA-PML or HA-SP100 expression vectors; HA-PML or 
HA-SP100 and endogenous Par4 were stained with monoclonal anti-HA and polyclonal anti-Par4 
antibodies, respectively. 

C Assavs to modulate THA P fa mily protein trafficking in the PML nuclear 

bodies 

Provided are assays for the identification of drugs that modulate (stimulate or inhibit) 
THAP-family or THAP domain protein, particularly THAP1, binding to PML-NB proteins or 
localization to PML-NBs. In general, any suitable assay for the detection of protein-protein 
interaction may be used. Two examples of high throughput screening assays include 1) a two 
hybrid-based assay in yeast to find compounds that disrupt interaction of the THAP1 bait with the 
PML-NB protein prey; and 2) in vitro interaction assays using recombinant THAP1 and PML-NB 
proteins. Such assays may be conducted as described above with respect to THAP-family/Par4 
assays except that the PML-NB protein is used in place of Par4. Binding may be detected for 
example, between a THAP-family protein and a PML protein or PML associated protein such as 
daxx, splOO, spl40, p53, pRB, CBP, BLM or SUMO-1 . 

Other assays for which standard methods are well known include assays to identify 
molecules that modulate, generally inhibit, the colocalization of THAP1 with PML-NBs. 
Detection can be carried out using a suitable label, such as an anti-THAPl antibody, and an 
antibody allowing the detection of PML-NB protein. 
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D. Assays to modulate PAR4 trafficking in the PML bodies 
Provided are assays for the identification of drugs that modulate (stimulate or inhibit) PAR4 
binding to PML-NB proteins or localization to PML-NBs. In general, any suitable assay for the 
detection of protein-protein interaction may be used. Two examples of high throughput screening 
assays include 1) a two hybrid-based assay in yeast to find compounds that disrupt interaction of the 
PAR4 bait with the PML-NB protein prey; and 2) in vitro interaction assays using recombinant 
PAR4 and PML-NB proteins. Such assays may be conducted as described above with respect to 
THAP -family polypeptide/Par4 assays except that the PML-NB protein is used in place of the 
THAP-family polypeptide. Binding may be detected, for example, between a Par4 protein and a 
PML protein or PML associated protein such as daxx, splOO, spl40, p53, pRB, CBP, BLM or 
SUMO-1. 

Other assays for which standard methods are well known include assays to identify 
molecules that modulate, generally inhibit, the colocalization of PAR4 with PML-NBs. Detection 
can be carried out using a suitable label, such as an anti-PAR4 antibody, and an antibody allowing 
the detection of PML-NB protein. 

This invention further pertains to novel agents identified by the above-described screening 
assays and to processes for producing such agents by use of these assays. Accordingly, in one 
embodiment, the present invention includes a compound or agent obtainable by a method 
comprising the steps of any one of the aforementioned screening assays (e.g., cell-based assays or 
cell-free assays). For example, in one embodiment, the invention includes a compound or agent 
obtainable by a method comprising contacting a cell which expresses a THAP-family target 
molecule with a test compound and determining the ability of the test compound to bind to, or 
modulate the activity of, the THAP-family target molecule. In another embodiment, the invention 
includes a compound or agent obtainable by a method comprising contacting a cell which expresses 
a THAP-family target molecule with a THAP-family protein or biologically-active portion thereof, 
to form an assay mixture, contacting the assay mixture with a test compound, and determining the 
ability of the test compound to interact with, or modulate the activity of, the THAP-family target 
molecule. In another embodiment, the invention includes a compound or agent obtainable by a 
method comprising contacting a THAP-family protein or biologically active portion thereof with a 
test compound and determining the ability of the test compound to bind to, or modulate (e.g., 
stimulate or inhibit) the activity of, the THAP-family protein or biologically active portion thereof. 
In yet another embodiment, the present invention includes a compound or agent obtainable by a 
method comprising contacting a THAP-family protein or biologically active portion thereof with a 
known compound which binds the THAP-family protein to form an assay mixture, contacting the 
assay mixture with a test compound,- and determining the ability of the test compound to interact 
with, or modulate the activity of the THAP-family protein. 
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Accordingly, it is within the scope of this invention to further use an agent identified as 
described herein in an appropriate animal model For example, an agent identified as described 
herein (e.g., a THAP-family or THAP domain modulating agent, an antisense THAP-family or 
THAP domain nucleic acid molecule, a THAP-family- or THAP domain- specific antibody, or a 
THAP-family- or THAP domain- binding partner) can be used in an animal model to determine the 
efficacy, toxicity, or side effects of treatment with such an agent. Alternatively, an agent identified 
as described herein can be used in an animal model to determine the mechanism of action of such 
an agent. Furthermore, this invention pertains to uses of novel agents identified by the above- 
described screening assays for treatments as described herein. 

The present invention also pertains to uses of novel agents identified by the above- 
described screening assays for diagnoses, prognoses, and treatments as described herein. 
Accordingly, it is within the scope of the present invention to use such agents in the design, 
formulation, synthesis, manufacture, and/or production of a drug or pharmaceutical composition for 
use in diagnosis, prognosis, or treatment, as described herein. For example, in one embodiment, the 
present invention includes a method of synthesizing or producing a drug or pharmaceutical 
composition by reference to the structure and/or properties of a compound obtainable by one of the 
above-described screening assays. For example, a drug or pharmaceutical composition can be 
synthesized based on the structure and/or properties of a compound obtained by a method in which 
a cell which expresses a THAP-family target molecule is contacted with a test compound and the 
ability of the test compound to bind to, or modulate the activity of, the THAP-family target 
molecule is determined. In another exemplary embodiment, the present invention includes a method 
of synthesizing or producing a drug or pharmaceutical composition based on the structure and/or 
properties of a compound obtainable by a method in which a THAP-family protein or biologically 
active portion thereof is contacted with a test compound and the ability of the test compound to bind 
to, or modulate (e.g., stimulate or inhibit) the activity of, the THAP-family protein or biologically 
active portion thereof is determined. 

E. Apoptosis assays 

It will be appreciated that any suitable apoptosis assay may be used to assess the apoptotic 
activity of a THAP family or THAP domain polypeptide, or a biologically active fragment or 
homologue thereof. 

Apoptosis can be recognized by a characteristic pattern of morphological, biochemical, and 
molecular changes. Cells going through apoptosis appear shrunken, and rounded; they also can be 
observed to become detached from culture dish. The morphological changes involve a characteristic 
pattern of condensation of chromatin and cytoplasm which can be readily identified by microscopy. 
When stained with a DNA-binding dye, e.g., H33258, apoptotic cells display classic condensed and 
punctate nuclei instead of homogeneous and round nuclei. 
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A hallmark of apoptosis is endonucleolysis, a molecular change in which nuclear DNA is 
initially degraded at the linker sections of nucleosomes to give rise to fragments equivalent to single 
and multiple nucleosomes. When these DNA fragments are subjected to gel electrophoresis, they 
reveal a series of DNA bands which are positioned approximately equally distant from each other 
on the gel. The size difference between the two bands next to each other is about the length of one 
nucleosome, i.e., 120 base pairs. This characteristic display of the DNA bands is called a DNA 
ladder and it indicates apoptosis of the cell. Apoptotic cells can be identified by flow cytometric 
methods based on measurement of cellular DNA content, increased sensitivity of DNA to 
denaturation, or altered light scattering properties. These methods are well known in the art and are 
within the contemplation of the invention. 

Abnormal DNA breaks which are characteristic of apoptosis can be detected by any means 
known in the art. In one preferred embodiment, DNA breaks are labeled with biotinylated dUTP (b- 
dUTP). As described in U.S. Patent No. 5,897,999, cells are fixed and incubated in the presence of 
biotinylated dUTP with either exogenous terminal transferase (terminal DNA transferase assay; 
TdT assay) or DNA polymerase (nick translation assay; NT assay). The biotinylated dUTP is 
incorporated into the chromosome at the places where abnormal DNA breaks are repaired, and are 
detected with fluorescein conjugated to avidin under fluorescence microscopy. 

Assessing THAP -family, THAPdomain and PAR4 polypeptides activity 

For assessing the nucleic acids and polypeptides of the invention, the apoptosis indicator 
which is assessed in the screening method of the invention may be substantially any indicator of the 
viability of the cell. By way of example, the viability indicator may be selected from the group 
consisting of cell number, cell refractility, cell fragility, cell size, number of cellular vacuoles, a 
stain which distinguishes live cells from dead cells, methylene blue staining, bud size, bud location, 
nuclear morphology, and nuclear staining. Other viability indicators and combinations of the 
viability indicators described herein are known in the art and may be used in the screening method 
of the invention. 

Cell death status can be evaluated based on DNA integrity. Assays for this determination 
include assaying DNA on an agarose gel to identify DNA breaking into oligonucleosome ladders 
and immunohistochemically detecting the nicked ends of DNA by labeling the free DNA end with 
fluorescein or horseradish peroxidase-conjugated UTP via terminal transferase. Routinely, one can 
also examine nuclear morphology by propidium iodide (PI) staining. All three assays (DNA ladder, 
end-labelling, and PI labelling) are gross measurements and good for those cells that are already 
dead or at the end stage of dying. 

In a preferred example, an apoptosis assay is based on serum-withdrawal induced apoptosis 
in a 3T3 cell line with tetracycline-regulated expression of a THAP family or THAP domain 
polypeptide, or a biologically active fragment or homologue thereof Detection of apoptotic cells is 
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accomplished by TUNEL labeling cells in 96- or 384-well microplates. This example is further 
described in Example 13. 

In other aspects, assays may test for the generation of cytotoxic death signals, anti-viral 
responses (Tartaglia et al., (1993) Cell 74(5):845-531), and/or the activation of acid 
sphingomyelinase (Wiegmann et al., (1994) Cell 78(6): 1005- 15) when the THAP-family protein is 
overexpressed or ectopically expressed in cells. Assaying for modulation of apoptosis can also be 
carried out in neuronal cells and lymphocytes for example, where factor withdrawal is known to 
induce cell suicide as demonstrated with neuronal cells requiring nerve growth factor to survive 
(Martin, D. P. et al, (1988) J. Cell Biol 106, 829-844) and lymphocytes depending on a specific 
lymphokine to live (Kyprianou, N. and Isaacs, J. T. (1988) Endrocrinology 122:552-562). 
THAP-family or THAP domain polypeptide -marker fusions in cell assays 
In one method, an expression vector encoding the a THAP family or THAP domain 
polypeptide, or a biologically active fragment or homologue thereof can be used to evaluate the 
ability of the polypeptides of the invention to induce apoptosis in cells. If desired, a THAP-family 
or THAP domain polypeptide may be fused to a detectable marker in order to facilitate 
identification of those cells expressing the a THAP family or THAP domain polypeptide, or a 
biologically active fragment or homologue thereof. For example, a variant of the Aequoria victoria 
GFP variant, enhanced green fluorescent protein (EGFP), can be used in fusion protein production 
(CLONTECH Laboratories, Inc., 1020 East Meadow Circle, Palo Alto, Calif. 94303), further 
described in U.S. Patent No. 6,191,269. 

The THAP-family- or THAP domain polypeptide cDNA sequence is fused in-frame by 
insertion of the THAP-family- or THAP domain polypeptide encoding cDNA into the Sall-BamHI 
site of plasmid pEGFP-NI (GenBank Accession # U55762). Cells are transiently transfected by the 
method optimal for the cell being tested (either CaPO 4 or Lipofectin). Expression of a THAP- 
family or THAP domain polypeptide and induction of apoptosis is examined using a fluorescence 
microscope at 24 hrs and 48 hrs post-transfection. Apoptosis can be evaluated by the TUNEL 
method (which involves 3' end-labeling of cleaved nuclear and/or morphological criteria DNA) 
(Cohen et al. (1984) J. Immunol. 132:38-42). Where the screen uses a fusion polypeptide 
comprising a THAP-family or THAP domain polypeptide and a reporter polypeptide (e.g., EGFP), 
apoptosis can be evaluated by detection of nuclear localization of the reporter polypeptide in 
fragmented nuclear bodies or apoptotic bodies. For example, where a THAP-family or THAP 
domain polypeptide- EGFP fusion polypeptide is used, distribution of THAP-family or THAP 
domain polypeptide EGFP-associated fluorescence in apoptotic cells would be identical to the 
distribution of DAPI or Hoechst 33342 dyes, which are conventionally used to detect the nuclear 
DNA changes associated with apoptosis (Cohen et al., supra). A minimum of approximately 100 
cells, which display characteristic EGFP fluorescence, are evaluated by fluorescence microscopy. 

-119- 



<WO 03051917A2_I_> 



WO 03/051917 PCT/EP02/14027 

Apoptosis is scored as nuclear fragmentation, marked apoptotic bodies, and cytoplasmic boiling. 
The characteristics of nuclear fragmentation are particularly visible when THAP-family or THAP 
domain polypeptide-EGFP condenses in apoptotic bodies. 

The ability of the THAP-family- or THAP domain polypeptides to undergo nuclear 
localization and to induce apoptosis can be tested by transient expression in 293 human kidney 
cells. If proved susceptible to THAP-family- or THAP domain- induced apoptosis, 293 cells can 
serve as a convenient initial screen for those THAP. family or THAP domain polypeptides, or 
biologically active fragments or homologues thereof that will likely also induce apoptosis in other 
(e.g. endothelial cells or cancer cells). In an exemplary protocol, 293 cells are transfected with 
plasmid vectors expressing THAP-family- or THAP domain- EGFP fusion protein. Approximately 
5* 10 6 293 cells in 100 mm dishes were transfected with 10 g of plasmid DNA using the calcium- 
phosphate method. The plasmids used are comprise CMV enhancer/promoter and THAP-family- or 
THAP domain- EGFP coding sequence). Apoptosis is evaluated 24 hrs after transfection by 
TUNEL and DAPI staining. The THAP-family- or THAP domain- EGFP vector transfected cells 
are evaluated by fluorescence microscopy with observation of typical nuclear aggregation of the 
EGFP marker as an indication of apoptosis. If apoptotic, the distribution of EGFP signal in cells 
expressing THAP-family- or THAP domain-EGFP will be identical to the distribution of DAPI or 
Hoechst 33342 dyes, which are conventionally used to detect the nuclear DNA changes associated 
with apoptosis (Cohen et al., supra). 

The ability of the THAP family or THAP domain polypeptides, or biologically active 
fragments or homologues thereof to induce apoptosis can also be tested by expression assays in 
human cancer cells, for example as available from NCI. Vector type (for example plasmid or 
retroviral or sindbis viral) can be selected based on efficiency in a given cell type. After the period 
indicated, cells are evaluated for morphological signs of apoptosis, including aggregation of THAP- 
family- or THAP domain- EGFP into nuclear apoptotic bodies. Cells are counted under a 
fluorescence microscope and scored as to the presence or absence of apoptotic signs, or cells are 
scored by fluorescent TUNEL assay and counted in a flow cytometer. Apoptosis is expressed as a 
percent of cells displaying typical advanced changes of apoptosis. 

Cells from the NCI panel of tumor cells include from example: 
-colon cancer, expression using a retroviral expression vector, with evaluation of apoptosis 
at 96 hrs post-infection (cell lines KM12; HT-29; SW-620; COLO205; HCT-5; HCC 2998; HCT- 
116); 

-CNS tumors, expression using a retroviral expression vector, with evaluation of apoptosis 
at 96 hrs post-infection (cell lines SF-268, astrocytoma; SF-539, glioblastoma; SNB-19, 
gliblastoma; SNB-75, astrocytoma; and U251, glioblastoma; 
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-leukemia cells, expression using a. retroviral expression vector, with evaluation of 
apoptosis at 96 hrs post-infection (cell lines CCRF-CEM, acute lymphocytic leukemia (ALL); 
K562, acute myelogenous leukemia (AML); MOLT-4, ALL; SR, immunoblastoma large cell; and 
RPMI 8226, Myeloblastoma); 

-prostate cancer, expression using a retroviral expression vector, with evaluation of 
apoptosis at 96 hrs post-infection (PC-3); 

-kidney cancer, expression using a retroviral expression vector, with evaluation of apoptosis 
at 96 hrs post-infection (cell lines 768-0; UO-31; TK10; ACHN); 

-skin cancer, expression using a retroviral expression vector, with evaluation of apoptosis at 
96 hrs post-infection (Melanoma) (cell lines SKMEL-28; M14; SKMEL-5; MALME-3); 

-lung cancer, expression using a retroviral expression vector, with evaluation of apoptosis 
at 96 hrs post-infection (cell lines HOP-92; NCI-H460; HOP-62; NCI-H522; NCI-H23; A549; NCI- 
H226; EKVX; NCI-H322); 

-breast cancer, expression using a retroviral expression vector, with evaluation of apoptosis 
at 96 hrs post-infection (cell lines MCF-7; T-47D; MCF-7/ADR; MDAMB43; MDAMB23; MDA- 
N;BT-549); 

-ovary cancer, expression using either a retroviral expression vector and protocol or the 
Sindbis viral expression vector and protocol, with evaluation of apoptosis at 96 hrs post-infection 
with retrovirus or at 24 hrs post-infection with Sindbis viral vectors (cell lines OVCAR-8; OVCAR- 
4; IGROV- 1 ; OVCAR-5; OVCAR3; SK-OV-3). 

In a further representative example, the susceptibility of malignant melanoma cells to 
apoptosis induced by a THAP family or THAP domain polypeptide, or a biologically active 
fragment or homologue thereof can be tested in several known melanoma cell types: human 
melanoma WM 266-4 (ATCC CRL-1676); human malignant melanoma A-375 (ATCC CRL-1619); 
human malignant, melanoma A2058 (ATCC CRL-11147); human malignant melanoma SK-MEL- 
3 1 (ATCC HTB-73); human malignant melanoma RPMI-7591 ATCC HTB-66 (metastasis to lymph 
node). Primary melanoma isolates can also be tested. In addition, human chronic myelogenous 
leukemia K-562 cells (ATCC CCL-243), and 293 human kidney cells (ATCC CRL-1573) 
(transformed primary embryonal cell) are tested. Normal human primary dermal fibroblasts and 
Rat-1 fibroblasts serve as controls. All melanoma cell lines are metastatic on the basis of their 
isolation from metastases or metastatic nodules. A transient expression strategy is used in order to 
evaluate induction of a THAP-family or THAP domain polypeptide -mediated apoptosis without 
artifacts associated with prolonged selection. An expression vector encoding the THAP-family or 
THAP domain polypeptide -EGFP fusion protein described below can be used in order to facilitate 
identification of those cells expressing the a THAP-family or THAP domain polypeptide. Cells are 
transiently transfected by the method optimal for the cell being tested (either CaP0 4 or Lipofectin). 
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Expression of a THAP-family or THAP domain polypeptide and induction of apoptosis is examined 
using a fluorescence microscope at 24 hrs and 48 hrs post-transfection. A minimum of 
approximately 100 cells, which display characteristic EGFP fluorescence, are evaluated by 
fluorescence microscopy. Apoptosis is scored as nuclear fragmentation, marked apoptotic bodies, 
and cytoplasmic boiling. The characteristics of nuclear fragmentation are particularly visible when 
THAP-family or THAP domain polypeptide-EGFP condenses in apoptotic bodies. 

In a further example, the susceptibility of endothelial cells to apoptosis induced by a THAP 
family or THAP domain polypeptide, or a biologically active fragment or homologue thereof can be 
tested in several known endothelial cell types: HUVEC (human umbilical vein endothelial cells; 
BioWhittaker-Clonetics, 8830 Biggs Ford Road, Walkersville, MD 21793-0127, Cat N° CC-2519), 
HMVEC-L (human microvascular endothelial cells from the lung; BioWhittaker-Clonetics, 8830 
Biggs Ford Road, Walkersville, MD 21793-0127, Cat N° CC-2527), HMVEC-d (human 
microvascular endothelial cells from the dermis; BioWhittaker-Clonetics, 8830 Biggs Ford Road, 
Walkersville, MD 21793-0127, Cat N° CC-2543). These and other endothelial cell types may be 
useful as models in providing an indication of the ability of THAP-family or THAP domain 
polypeptides to induce apoptosis in therapeutic strategies for the regulation of angiogenesis. A 
transient expression strategy is used in order to evaluate induction of a THAP-family or THAP 
domain polypeptide -mediated apoptosis without artifacts associated with prolonged selection. An 
expression vector encoding the a THAP-family or THAP domain polypeptide -EGFP fusion protein 
described below can be used in order to facilitate identification of those cells expressing the a 
THAP-family or THAP domain polypeptide. Cells are transiently transfected by the method optimal 
for the cell being tested (either CaPC>4 or Lipofectin). Expression of a THAP-family or THAP 
domain polypeptide and induction of apoptosis is examined using a fluorescence microscope at 24 
hrs and 48 hrs post-transfection. A minimum of approximately 100 cells, which display 
characteristic EGFP fluorescence, are evaluated by fluorescence microscopy. Apoptosis is scored as 
nuclear fragmentation, marked apoptotic bodies, and cytoplasmic boiling. The characteristics of 
nuclear fragmentation are particularly visible when THAP-family or THAP domain polypeptide- 
EGFP condenses in apoptotic bodies. 

In another example, a transient transfection assay procedure is similar to that previously 
described for detecting apoptosis induced by IL-1 -beta-converting enzyme (Miura et al., Cell 75: 
653-660 (1993); Kumar et al., Genes Dev. 8: 1613-1626 (1994); Wang et al., Cell 78: 739-750 
(1994); and U.S. Patent No. 6,221,615). One day prior to transfection, cells (for example Rat-1 
cells) are plated in 24 well dishes at 3.5* 10 4 cells/well. The following day, the cells are transfected 
with a marker plasmid encoding beta-galactosidase, in combination with an expression plasmid 
encoding TllAP-family or THAP domain polypeptide, by the Lipofectamine procedure 
(Gibco/BRL). At 24 hours post transfection, cells are fixed and stained with X-Gal to detect beti- 
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galactosidase expression in cells that received plasmid DNA (Miura et al., supra). The number of 
blue cells is counted by microscopic examination and scored as either live (flat blue cells) or dead 
(round blue cells). The cell killing activity of the THAP-family or THAP domain polypeptide in 
this assay is manifested by a large reduction in the number of blue cells obtained relative to co- 
transfection of the beta-gal plasmid with a control expression vector (i.e., with NO THAP-family or 
THAP domain polypeptide cDNA insert). 

In yet another example, beta-galactosidase co-transfection assays can be used for 
determination of cell death. The assay is performed as described (Hsu, H. et al, (1995). Cell 81,495- 
504; Hsu, H. et al, (1996a). Cell 84, 299-308; and Hsu, H. et al, (1996b) Inmunity 4, 387-396 and 
U.S. Patent No. 6,242,569). Transfected cells are stained with X-gal as described in Shu, H. B. et al, 
((1995) J. Cell Sci. 108, 2955-2962). The number of blue cells from 8 viewing fields of a 35 mm 
dish is determined by counting. The average number from one representative experiment is shown. 

Assays for apoptosis can also be carried out by making use of any suitable biological 
marker of apoptosis. Several methods are described as follows. 

In one aspect, fluorocytometric studies of cell death status can be carried out. Technology 
used in fluorocytometric studies employs the identification of cells at three different phases of the 
cell cycle: Gj, S. and G2. This is largely performed by DNA quantity staining by propidium iodide 
labelling. Since the dying cell population contains the same DNA quantity as the living counterparts 
at any of the three phases of the cell cycle, there is NO way to distinguish the two cell populations. 
One can perform double labelling for a biological marker of apoptosis (e.g. terminin Tp30, U.S. 
Patent No. 5,783,667) positivity and propidium iodide (PI) staining together. Measurement of the 
labelling indices for the biological marker of apoptosis and PI staining can be used in combination 
to obtain the exact fractions of those cells in G 1 that are living and dying. Similar estimations can 
be made for the S-phase and G2 phase cell populations. 

In this assay, the cells are processed for formaldehyde fixation and extraction with 0.05% 
Triton. Afterwards, the cell specimens are incubated with monoclonal antibody to a marker of 
apoptosis overnight at room temperature or at 37C for one hour. This is followed by further 
incubation with fluoresceinated goat antimouse antibody, and subsequent incubation by propidium 
iodide staining. The completely processed cell specimens are then evaluated by fluorocytometric 
measurement on both fluorescence (marker of apoptosis) and rhodamine (PI) labelling intensity on 
a per cell basis, with the same cell population simultaneously. 

In another aspect, it is possible to assess the inhibitory effect on cell growth by therapeutic 
induction of apoptosis. One routine method to deteimine whether a particular chemotherapeutic 
drug can inhibit cancerous cell growth is to examine cell population size either in culture, by 
measuring the reduction in cell colony size or number, or measuring soft agar colony growth or in 
vivo tumor formation in nude mice, which procedures require time for development of the colonies 
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or tumor to be large enough to be detectable. Experiments involved .in these approaches in general 
require large-scale planning and multiple repeats of lengthy experimental span (at least three 
weeks). Often these assays do not take into account the fact that a drug may not be inhibiting cell 
growth, but rather killing the cells, a more favorable consequence needed for chemotherapeutic 
treatment of cancer. Thus, assays for the assessment of apoptotis activity can involve using a 
biological or biochemical marker specific for quiescent, non-cycling or non-proliferating cells. For 
example, a monoclonal antibody can be used to assess the non-proliferating population of cells in a 
given tissue which indirectly gives a measure of the proliferating component of a tumor or cell 
mass. This detection can be combined with a biological or biochemical marker (e.g. antibodies) to 
detect the dying cell population pool, providing a powerful and rapid assessment of the 
effectiveness of any given drugs in the containment of cancerous cell growth. Applications can be 
easily performed at the immunofluorescence microscopic level with cultured cells or tissue sections. 

In other aspects, a biological or biochemical marker can be used to assess pharmacological 
intervention on inhibition of cell death frequency in degenerative diseases. For degenerative 
diseases such as Alzheimer's or Parkinson's disease, these losses may be due to the premature 
activation of the cell death program in neurons. In osteoporosis, the cell loss may be due to an 
improper balance between osteoblast and osteoclast cells, due to the too active programmed cell 
death process killing more cells than the bone tissue can afford. Other related phenomena may also 
occur in the wound healing process, tissue transplantation and cell growth in the glomerus during 
kidney infection, where the balance between living and dying cell populations is an essential issue 
to the health status of the tissue, and are further described in the section titled "Methods of 
treatment". A rapid assessment of dying cell populations can be made through the 
immunohistochemical and biochemical measurements of a biological or biochemical marker of 
apoptosis in degenerative tissues. In one example, a biological or biochemical marker can be used 
to assess cell death status in oligodendrocytes associated with Multiple Sclerosis. Positive staining 
of monoclonal antibody to a marker of apoptosis (such as Tp30, U.S. Patent No. 5,783,667) occurs 
in dying cultured human oligodendrocytes. The programmed cell death event is activated in these 
oligodendrocytes by total deprivation of serum, or by treatment with tumor necrosis factor (TNF). 

In general, a biological or biochemical marker can also be used to assess cell death status in 
pharmacological studies in animal models. Attempting to control either a reduced cell death rate, in 
the case of cancer, or an increased cell death rate, in the case of neurodegeneration, has been 
recently seen as a new mode of disease intervention. Numerous approaches via either intervention 
with known drugs or gene therapy are in progress, starting from the base of correcting the altered 
programmed cell death process, with the concept on maintaining a balanced cell mass in any given 
tissue. For these therapeutic interventions, the bridge between studies in cultured cells and clinical 
trials is animal studies, i.e. success in intervention with animal models, in either routine laboratory 
animals or transgenic mice bearing either knock-out or overexpression phenotypes. Thus, a 
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biological or biochemical marker of apoptosis, such as an antibody for an apoptosis-specific 
protein, is a useful tool for examining apoptotic death status in terms of change in dying cell 
numbers between normal and experimentally manipulated animals. In this context the invention, as 
a diagnostic tool for assessing cell death status, could help to determine the efficacy and potency of 
a drug or a gene therapeutic approach. 

As discussed, provided are methods for assessing the activity of THAP-family members 
and therapeutic treatment acting on THAP-family members or related biological pathways. 
However, in other aspects, the same methods may be used for assessment of apoptosis in general, 
when a THAP-family member is used as a biological marker of apoptosis. Thus, the invention also 
provides diagnostic and assay methods using a THAP-family member as a marker of cell death or 
apoptotic activity. Further diagnostic assays are also provided herein in the section 
titled 'Diagnostic and prognostic uses'. 

METHODS OF TREATMENT 

A large body of evidence gathered from experiments carried out with apoptosis modulating 
strategies suggests that treatments acting on apoptosis-inducing or cell proliferation-reducing 
proteins may offer new treatment methods for a wide range of disorders. Methods of treatment 
according to the invention may act in a variety of manners, given the novel function provided for a 
number of proteins, and the linking of several biological pathways. 

Provided herein are treatment methods based on the functionalization of the THAP-family 
members. THAP family or THAP domain polypeptides, and biologically active fragments and 
homologues thereof, as described further herein may be useful in modulation of apoptosis or cell 
proliferation. 

The methods of treatment involve acting on a molecule of the invention (that is, a THAP 
family member polypeptide, THAP-family target, or PAR4 or PAR4 target). Included are methods 
which involve modulating THAP-family polypeptide activity, THAP-family target activity, or 
PAR4 or PAR4 target activity. This modulation (increasing or decreasing) of activity can be 
carried out in a number of suitable ways, several of which have been described in the present 
application. 

For example, methods of treatment may involve modulating a "THAP-family activity", 
"biological activity of a THAP-family member" or "functional activity of a THAP-family member". 
Modulating THAP-family activity may involve modulating an association with a THAP-family- 
target molecule (for example, association of THAPl, THAP2 or THAP3 with Par4 or association of 
THAP1, THAP2 or THAP3 with a PML-NB protein) or preferably any other activity selected from 
the group consisting of: (1) mediating apoptosis or cell proliferation when expressed or introduced 
into a cell, most preferably inducing or enhancing apoptosis, and/or most preferably reducing cell 
proliferation; (2) mediating apoptosis or cell proliferation of an endothelial cell; (3) mediating 
apoptosis or cell proliferation of a hyperproliferative cell; (4) mediating apoptosis or cell 
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proliferation of a CNS cell, preferably a neuronal or glial cell; or (5) an activity determined in an 
animal selected from the group consisting of mediating, preferably inhibiting angiogenesis, 
mediating, preferably inhibiting inflammation, inhibition of metastatic potential of cancerous 
tissue, reduction of tumor burden, increase in sensitivity to chemotherapy or radiotherapy, killing a 
cancer cell, inhibition of the growth of a cancer cell, or induction of tumor regression. Detecting 
THAP-family activity may also comprise detecting any suitable therapeutic endpoint associated 
with a disease condition discussed herein. 

In another example, methods of treatment may involve modulating a "PAR4 activity", 
"biological activity of PAR4" or "functional activity of PAR4 ". Modulating PAR4 activity may 
involve modulating an association with a PAR4-target molecule (for example THAP1, THAP2, 
THAP3 or PML-NB protein) or most preferably PAR4 apoptosis inducing or enhancing (e.g. signal 
transducing) activity, or inhibition of cell proliferation or cell cycle. 

Methods of treatment may involve modulating the recruitment, binding or association of 
proteins to PML-NBs, or otherwise modulating PML-NBs activity. The present invention also 
provides methods for modulating PAR4 activity, comprising modulating PAR4 interactions with 
THAP-family proteins, and PAR4 and PML-NBs, as well as modulating THAP-family activity, 
comprising modulating for example THAP1 interactions with PML-NBs. The invention 
encompasses inhibiting or increasing the recruitment of THAP1, or PAR4 to PML-NBs. Preventing 
the binding of either or both of THAP1 or PAR4 to PML-NBs may increase the bioavailability or 
THAP1 and/or PAR4, thus providing a method of increasing THAP1 and/or PAR4 activity. The 
invention also encompasses inhibiting or increasing the binding of a THAP-family protein (such as 
THAP1) or PAR4 to PML-NBs or to another protein associated with PML-NBs, such as a protein 
selected from the group consisting of daxx, splOO, spl40, p53, pRB, CBP, BLM, SUMO-L For 
example, the invention encompasses modulating PAR4 activity by preventing the binding of 
THAPl to PAR4, or by preventing the recruitment or binding of PAR4 to PML-NBs. 

Therapeutic methods and compositions of the invention may involve (1) modulating 
apoptosis or cell proliferation, most preferably inducing or enhancing apoptosis, and/or most 
preferably reducing cell proliferation; (2) modulating apoptosis or cell proliferation of an 
endothelial cell (3) modulating apoptosis or cell proliferation of a hyperproliferative cell; (4) 
modulating apoptosis or cell proliferation of a CNS cell, preferably a neuronal or glial cell; (5} 
inhibition of metastatic potential of cancerous tissue, reduction of tumor burden, increase in 
sensitivity to chemotherapy or radiotherapy, killing a cancer cell, inhibition of the growth of a 
cancer cell, or induction tumor regression; or (6) interaction with a THAP family target molecule or 
THAP domain target molecule, preferably interaction with a protein or a nucleic acid. Methods 
may also involve improving a symptom of or ameliorating a condition as further described herein. 
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Antiapoptotic therapy 

Molecules *of the invention (e.g. those obtained using the screening methods described 
herein, dominant negative mutants, antibodies etc.) which inhibit apoptosis are also expected to be 
useful in the treatment and/or prevention of disease. Diseases in which it is desirable to prevent 
apoptosis include neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, 
amyotrophic lateral sclerosis, retinitis pigmentosa and cerebellar degeneration; myelodysplasis such 
as aplastic anemia; ischemic diseases such as myocardial infarction and stroke; hepatic diseases 
such as alcoholic hepatitis, hepatitis B and hepatitis C; joint-diseases such as osteoarthritis; 
atherosclerosis; and etc. The apoptosis inhibitor of the present invention is especially preferably 
used as an agent for prophylaxis or treatment of a neurodegenerative disease (see also Adams, J. 
ML, Science, 281:1322 (1998). 

Included as inhibitors of apoptosis as described herein are generally any molecule which 
inhibits activity of a THAP family or THAP domain polypeptide, or a biologically active fragment 
or homologue thereof, a THAP-family target protein or PAR4 (particularly PAR4/PML-NB protein 
interactions). THAP-family and THAP domain polypeptides inhibitors may include for example 
antibodies, peptides, dominant negative THAP-family or THAP domain analogs, small molecules, 
ribozyme or antisense nucleic acids. These inhibitors may be particularly advantageous in the 
treatment of neurodegenerative disorders. Particularly preferred are inhibitors which affect binding 
of THAP-family protein to a THAP-family target protein, and inhibitors which affect the DNA 
binding activity of a THAP-family protein. 

In further preferred aspects the invention provides inhibitors of THAP-family activity, 
including but not limited to molecules which interfere or inhibit interactions of THAP-family 
proteins with PAR4, for the treatment of endothelial cell related disorders and neurodegenerative 
disorders. Support is found in the literature, as PAR4 appears to play a key role in neuronal 
apoptosis in various neurodegenerative disorders (Guo et aL, 1998; Mattson et al., 2000; Mattson et 
al, 1999; Mattson et aL, 2001). THAP1, which is expressed in brain and associates with PAR4 may 
therefore also play a key role in neuronal apoptosis. Drugs that inhibit THAP-family and/or inhibit 
THAP-family/PAR4 complex formation may lead to the development of novel preventative and 
therapeutic strategies for neurodegenerative disorders. 

Apoptosis regulation in endothelial cells 

The invention also provides methods of regulating angiogenesis in a subject which are 
expected to be useful in the treatment of cancer, cardiovascular diseases and inflammatory diseases. 
An inducer of apoptosis of immortalized cells is expected to be useful in suppressing tumorigenesis 
and/or metastasis in malignant tumors. Examples of malignant tumors include leukemia (for 
example, myelocytic leukemia, lymphocytic leukemia such as Burkitt lymphoma), digestive tract 
carcinoma, lung carcinoma, pancreas carcinoma, ovary carcinoma, uterus carcinoma, brain tumor, 
malignant melanoma, other carcinomas, and sarcomas.The present inventors have isolated both 
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THAP1 and PAR4 cDNAs from human endothelial cells, and both PAR4 and PML are known to be 
expressed predominantly in blood vessel endothelial cells (Boghaert et al., (1997) Cell Growth 
Differ 8(8):S81-90; Terris B. et al, (1995) Cancer Res. 55(7):1590-7, 1995), suggesting that the 
PML-NBs-and the newly associated THAP1/PAR4 proapoptotic complex may be a major regulator 
of endothelial cell apoptosis in vivo and thus constitute an attractive therapeutic target for 
angiogenesis-dependent diseases. For example, THAP1 and PAR4 pathways may allow selective 
treatments that regulate (e.g. stimulate or inhibit) angiogenesis. 

In a first aspect, the invention provides methods of inhibiting endothelial cell apoptosis, by 
administering a THAP1 or PAR4 inhibitor, or optionally a THAP1/PAR4 interaction inhibitor or 
optionally an inhibitor of THAP1 DNA binding activity. As further described herein, the THAP 
domain is involved in THAP1 pro-apoptotic activity. Deletion of the THAP domain abrogates the 
proapoptotic activity of THAP 1 in mouse 3T3 fibroblasts, as shown in Example 11. Also, as 
further described herein, deletion of residues 168-172 or replacement of residues 171-172 abrogates 
THAP1 binding to PAR4 both in vitro and in vivo and results in lack of recruitment of PAR4 by 
THAP1 to PML-NBs. For PAR4, the leucine zipper domain is required (and is sufficient) for 
binding to THAP1. 

Inhibiting endothelial cell apoptosis may improve angiogenesis and vasculogenesis in 
patients with ischemia and may also interfere with focal dysregulated vascular remodeling, the key 
mechanism for atherosclerotic disease progression. 

In another aspect, the invention provides methods of inducing endothelial cell apoptosis, by 
administering for example a biologically active THAP family polypeptide such as THAP1, a THAP 
domain polypeptide or a PAR4 polypeptide, or a biologically active fragment or homologue thereof, 
or a THAP1 or PAR4 stimulator. Stimulation of endothelial cell apoptosis may prevent or inhibit 
angiogenesis and thus limit unwanted neovascularization of tumors or inflamed tissues (see 
Dimmeler and Zeiher, Circulation Research, 2000, 87 :434-439). 

Angiogenesis 

Angiogenesis is defined in adult organism as the formation of new blood vessels by a 
process of sprouting from pre-existing vessels. This neovascularization involves activation, 
migration, and proliferation of endothelial cells and is driven by several stimuli, among those shear 
stress. Under normal physiological conditions, humans or animals undergo angiogenesis only in 
very specific restricted situations. For example, angiogenesis is normally observed in wound 
healing, fetal and embryonal development and formation of the corpus luteum, endometrium and 
placenta. Molecules of the invention may have endothelial inhibiting or inducing activity, having 
the capability to inhibit or induce angiogenesis in general. 

Both controlled and uncontrolled angiogenesis are thought to proceed in a similar manner. 
Endothelial cells and pericytes, surrounded by a basement membrane, form capillary blood vessels. . 
Angiogenesis begins with the erosion of the basement membrane by enzymes released by 
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endothelial cells and leukocytes. The endothelial cells, which line the lumen of blood vessels, then 
protrude through the basement membrane. Angiogenic stimulants induce the endothelial ceils to 
migrate through the eroded basement membrane. The migrating cells form a "sprout" off the parent 
blood vessel, where the endothelial cells undergo mitosis and proliferate. The endothelial sprouts 
merge with each other to form capillary loops, creating the new blood vessel. 

Persistent, unregulated angiogenesis occurs in a multiplicity of disease states, tumor 
metastasis and abnormal growth by endothelial cells and supports the pathological damage seen in 
these conditions. The diverse pathological disease states in which unregulated angiogenesis is 
present have been grouped together as angiogenic dependent or angiogenic associated diseases. It is 
thus an object of the present invention to provide methods and compositions for treating diseases 
and processes that are mediated by angiogenesis including, but not limited to, hemangioma, solid 
tumors, leukemia, metastasis, telangiectasia psoriasis scleroderma, pyogenic granuloma, 
Myocardial angiogenesis, plaque neovascularization, cororany collaterals, ischemic limb 
angiogenesis, corneal diseases, rubeosis, neovascular glaucoma, diabetic retinopathy, retrolental 
fibroplasia, arthritis, diabetic neovascularization, macular degeneration, wound healing, peptic 
ulcer, fractures, keloids, vasculogenesis, hematopoiesis, ovulation, menstruation, and piacentation. 

(i) Anti-angiogenic therapy 

In one aspect the invention provides anti-angiogenic therapies as potential treatments for a 
wide variety of diseases, including cancer, arteriosclerosis, obesity, arthritis, duodenal ulcers, 
psoriasis, proliferative skin disorders, cardiovascular disorders and abnormal ocular 
neovascularization caused, for example, by diabetes (Folkman, Nature Medicine 1:27 (1995) and 
Folkman, Seminars in Medicine of the Beth Israel Hospital, Boston, New England Journal of 
Medicine, 333:1757 (1995)). Anti-angiogenic therapies are thought to act by inhibiting the 
formation of new blood vessels. 

The present invention thus provides methods and compositions for treating diseases and 
processes mediated by undesired and uncontrolled angiogenesis by administering to a human or 
animal a composition comprising a substantially purified THAP family or THAP domain 
polypeptide, or a biologically active fragment, homologue or derivative thereof in a dosage 
sufficient to inhibit angiogenesis, administering a vector capable of expressing a nucleic acid 
encoding a THAP-family or THAP domain protein, or administering any other inducer of 
expression or activity of a THAP-family or THAP domain protein. The present invention is 
particularly useful for treating or for repressing the growth of tumors. Administration of THAP- 
family or THAP domain nucleic acid, protein or other inducer to a human or animal with 
prevascularized metastasized tumors will prevent the growth or expansion of those tumors. THAP- 
family activity may be used in combination with other compositions and procedures for the 
treatment of diseases. For example, a tumor may be treated conventionally with surgery, radiation 
or chemotherapy combined with THAP-family or THAP domain protein and then THAP-family or 

-129- 



.03051 91 7A2J_> 



WO 03/051917 



PCT/EP02/14027 



THAP domain protein may be subsequently administered to the patient to extend the dormancy of 
micrometastases and to stabilize any residual primary tumor. 

In a preferred example, a THAP-family polypeptide activity, preferably a THAP1 activity 
is used for the treatment of arthritis, for example rheumatiod arthritis. Rheumatoid arthritis is 
characterized by symmetric, polyarticular inflammation of synovial-lined joints, and may involve 
extraarticular tissues, such as the pericardium, lung, and blood vessels. 

(ii) Angiogenic therapy 

In another aspect, the inhibitors of THAP-family protein activity, particularly THAPi 
activity, could be used as an anti-apoptotic and thus as an angiogenic therapy. Angiogenic therapies 
are potential treatments for promoting wound healing and for stimulating the growth of new blood 
vessels to by-pass occluded ones. Thus, pro-angiogenic therapies could potentially augment or 
replace by-pass surgeries and balloon angioplasty (PTCA). For example, with respect to 
neovascularization to bypass occluded blood vessels, a "therapeutically effective amount" is a 
quantity which results in the formation of new blood vessels which can transport at least some of 
the blood which normally would pass through the blocked vessel. 

The THAP-family protein of the present invention can for example be used to generate 
antibodies that can be used as inhibitors of apoptosis. The antibodies can be either polyclonal 
antibodies or monoclonal antibodies. In addition, these antibodies that specifically bind to the 
THAP-family protein can be used in diagnostic methods and kits that are well known to those of 
ordinary skill in the art to detect or quantify the THAP-family protein in a body fluid. Results from 
these tests can be used to diagnose or predict the occurrence or recurrence of a cancer and other 
angiogenic mediated diseases. 

It will be appreciated that other inhibitors of THAP-family and THAP domain proteins can 
also be used in angiogenic therapies, including for example small molecules, antisense nucleic 
acids, dominant negative THAP-family and THAP domain proteins or peptides identified using the 
above methods. 

In view of applications in both angiogenic and antiangiogenic therapies, molecules of the 
invention may have endothelial inhibiting or inducing activity, having the capability to inhibit or 
induce angiogenesis in general. It will be appreciated that methods of assessing such capability are 
known in the art, including for example assessing antiangiogenic properties as the ability inhibit the 
growth of bovine capillary endothelial cells in culture in the presence of fibroblast growth factor. 

It is to be understood that the present invention is contemplated to include any derivatives 
of the THAP family or THAP domain polypeptides, and biologically active fragments and 
homologues thereof that have endothelial inhibitory or apoptotic activity. The present invention 
includes full-length THAP-family and THAP domain proteins, derivatives of the THAP-family and 
THAP domain proteins and biologically-active fragments of the THAP-family and THAP domain 
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proteins. These include proteins with THAP-family protein activity that have amino acid 
substitutions or have sugars or other molecules attached to amino acid functional groups. The 
methods also contemplate the use of genes that code for a THAP-family protein and to proteins that 
are expressed by those genes. 

As discussed, several methods are described herein for delivering a modulator to a subject 
in need of treatment, including for example small molecule modulators, nucleic acids including via 
gene therapy vectors, and polypeptides including peptide mimetics, active polypeptides, dominant 
negative polypeptides and antibodies. It will be thus be appreciated that modulators of the 
invention identified according to the methods in the section titled "Drug Screening Assays" can be 
further tested in cell or animal models for their ability to ameliorate or prevent a condition 
involving a THAP-family polypeptide, particularly THAP1, THAP1, THAP2 or THAP3/PAR4 
interactions, THAP-family DNA binding or PAR4 / PML-NBs interactions. Likewise, nucleic 
acids, polypeptides and vectors (e.g. viral) can also be assessed in a similar manner. 

An "individual" treated by the methods of this invention is a vertebrate, particularly a 
mammal (including model animals of human disease, farm animals, sport animals, and pets), and 
typically a human. 

"Treatment" refers to clinical intervention in an attempt to alter the natural course of the 
individual being treated, and may be performed either for prophylaxis or during the course of 
clinical pathology. Desirable effects include preventing occurrence or recurrence of disease, 
alleviation of symptoms, diminishment of any direct or indirect pathological consequences of the 
disease, such as hyperresponsiveness, inflammation, or necrosis, lowering the rate of disease 
progression, amelioration or palliation of the disease state, and remission or improved prognosis. 
The "pathology" associated with a disease condition is anything that compromises the well-being, 
normal physiology, or quality of life of the affected individual. 

Treatment is performed by administering an effective amount of a THAP-family 
polypeptide inhibitor or activator. An "effective amount" is an amount sufficient to effect a 
beneficial or desired clinical result, and can be administered in one or more doses. 

The criteria for assessing response to therapeutic modalities employing the lipid 
compositions of this invention are dictated by the specific condition, measured according to 
standard medical procedures appropriate for the condition. 

Pharmaceutical Compositions 

Compounds capable of inhibiting THAP-family activity, preferably small molecules but 
also including peptides, THAP-family nucleic acid molecules, THAP-family proteins, and anti- 
THAP-family antibodies (also referred to herein as "active compounds") of the invention can be 
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incorporated into pharmaceutical compositions suitable for administration. Such compositions 
typically comprise a pharmaceutical^ acceptable carrier. As used herein the language 
"pharmaceutically acceptable carrier" is intended to include any and all solvents, dispersion media, 
coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 
compatible with pharmaceutical administration. The use of such media and agents for 
pharmaceutically active substances is well known in the art. Except insofar as any conventional 
media or agent is incompatible with the active compound, use thereof in the compositions is 
contemplated. Supplementary active compounds can also be incorporated into the compositions. 

A pharmaceutical composition of the invention is formulated to be compatible with its 
intended route of administration. Examples of routes of administration include parenteral, e.g., 
intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (topical), transmucosal, 
and rectal administration. Solutions or suspensions used for parenteral, intradermal, or 
subcutaneous application can include the following components: a sterile diluent such as water for 
injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other 
synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such 
as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediaminetetraacetic acid; 
buffers such as acetates, citrates or phosphates and agents for the adjustment of tonicity such as 
sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or 
sodium hydroxide. The parenteral preparation can be enclosed in ampoules, disposable syringes or 
multiple dose vials made of glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions 
(where water soluble) or dispersions and sterile powders for the extemporaneous preparation of 
sterile injectable solutions or dispersion. For intravenous administration, suitable carriers include 
physiological saline, bacteriostatic water, Cremophor ELa (BASF, Parsippany, N.J.) or phosphate 
buffered saline (PBS). In all cases, the composition must be sterile and should be fluid to the extent 
that easy syringability exists. It must be stable under the conditions of manufacture and storage and 
must be preserved against the contaminating action of microorganisms such as bacteria and fungi. 
The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol 
(for example, glycerol, propylene glycol, and liquid polyetheylene glycol, and the like), and suitable 
mixtures thereof. The proper fluidity can be maintained, for example, by the use of a coating such 
as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use 
of surfactants. Prevention of the action of microorganisms can be achieved by various antibacterial 
and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and 
the like. In many cases, it will be preferable to include isotonic agents, for example, sugars, 
polyalcohols such as manitol, sorbitol, sodium chloride in the composition. Prolonged absorption of 
the injectable compositions can be brought about by including in the composition an agent whicb 
delays absorption, for example, aluminum monostearate and gelatin. 
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Where the active compound is a protein, peptide or anti-THAP-family antibody, sterile 
injectable solutions can be prepared by incorporating the active compound" (e.g., ) in the required 
amount in an appropriate solvent with one or a combination of ingredients enumerated above, as 
required, followed by filtered sterilization. Generally, dispersions are prepared by incorporating the 
active compound into a sterile vehicle which contains a basic dispersion medium and the required 
other ingredients from those enumerated above. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and freeze- 
drying which yields a powder of the active ingredient plus any additional desired ingredient from a 
previously sterile-filtered solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They can be 
enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in the form of 
tablets, troches, or capsules. For administration by inhalation, the compounds are delivered in the 
form of an aerosol spray from pressured container or dispenser which contains a suitable propellant, 
e.g., a gas such as carbon dioxide, or a nebulizer. Systemic administration can also be by 
transmucosal or transdermal means. For transmucosal or transdermal administration, penetrants 
appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally 
known in the art, and include, for example, for transmucosal administration, detergents, bile salts, 
and fusidic acid derivatives. Transmucosal administration can be accomplished through the use of 
nasal sprays or suppositories. For transdermal administration, the active compounds are formulated 
into ointments, salves, gels, or creams as generally known in the art. Most preferably, active 
compound is delivered to a subject by intravenous injection. 

In one embodiment, the active compounds are prepared with carriers that will protect the 
compound against rapid elimination from the body, such as a controlled release formulation, 
including implants and microencapsulated delivery systems. Biodegradable, biocompatible 
polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, collagen, 
polyorthoesters, and polylactic acid. Methods for preparation of such formulations will be apparent 
to those skilled in the art. The materials can also be obtained commercially from Alza Corporation 
and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes targeted to infected 
cells with monoclonal antibodies to viral antigens) can also be used as pharmaceutically acceptable 
carriers. These can be prepared according to methods known to those skilled in the art, for example, 
as described in U.S. Pat. No. 4,522,8 11. 

It is especially advantageous to formulate oral or preferably parenteral compositions in 
dosage unit form for ease of administration and uniformity of dosage. Dosage unit form as used 
herein refers to physically discrete units suited as unitary dosages for the subject to be treated; each 
unit containing a predetermined quantity of active compound calculated to produce the desired 
therapeutic effect in association with thfe required pharmaceutical carrier. The specification for the 
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dosage unit forms of the invention are dictated by and directly dependent on the unique 
characteristics of the active compound and the particular therapeutic effect to be achieved, and the 
limitations inherent in the art of compounding such an active compound for the treatment of 
individuals. 

Toxicity and therapeutic efficacy of such compounds can be determined by standard 
pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 
(the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% 
of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and 
it can be expressed as the ratio LD50/ED50. Compounds which exhibit large therapeutic indices are 
preferred. While compounds that exhibit toxic side effects may be used, care should be taken to 
design a delivery system that targets such compounds to the site of affected tissue in order to 
minimize potential damage to uninfected cells and, thereby, reduce side effects. 

The data obtained from the cell culture assays and animal studies can be used in 
formulating a range of dosage for use in humans. The dosage of such compounds lies preferably 
within a range of circulating concentrations that include the ED50 with little or NO toxicity. The 
dosage may vary within this range depending upon the dosage form employed and the route of 
administration utilized. For any compound used in the method of the invention, the therapeutically 
effective dose can be estimated initially from cell culture assays. A dose may be formulated in 
animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the 
concentration of the test compound which achieves a half-maximal inhibition of symptoms) as 
determined in cell culture. Such information can be used to more accurately determine useful doses 
in humans. Levels in plasma may be measured, for example, by high performance liquid 
chromatography. 

The pharmaceutical compositions can be included in a container, pack, or dispenser 
together with instructions for administration. 
Diagnostic and Prognostic Uses 

The nucleic acid molecules, proteins, protein homologues, and antibodies described herein 
can be used in one or more of the following methods: diagnostic assays, prognostic assays, 
monitoring clinical trials, and pharmacogenetics; and in drug screening and methods of treatment 
(e.g., therapeutic and prophylactic) as further described herein. 

The invention provides diagnostic and prognositc assays for detecting THAP-family 
members, as further described. Also provided are diagnostic and prognostic assays for detecting 
interactions between THAP-family members and THAP-family target molecules. In a preferred 
example, a THAP-family member is THAP1, THAP2 or THAP3 and the THAP-family target is 
PAR4 or a PML-NB protein. 

The invention also provides diagnostic and prognositc assays for detecting THAP1 and/or 
PAR4 localization to or association with PML-NBs, or association with or binding to a PML-NB- 
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associated protein, such as daxx, splOO, spl40, p53, pRB, CBP, BLM or SUMO-1. In a preferred 
method, the invention provides detecting PAR4 localization to or association with PML-NBs. In a 
further aspect, the invention provides detecting THAP-family nucleic acid binding activity. 

The isolated nucleic acid molecules of the invention can be used, for example, to detect 
THAP-family polypeptide mRNA (e.g., in a biological sample) or a genetic alteration in a THAP- 
family gene, and to modulate a THAP-family polypeptide activity, as described further below. The 
THAP-family proteins can be used to treat disorders characterized by insufficient or excessive 
production of a THAP-family protein or THAP-family target molecules. In addition, the THAP- 
family proteins can be used to screen for naturally occurring THAP-family target molecules, to 
screen for drugs or compounds which modulate, preferably inhibit THAP-family activity, as well as 
to treat disorders characterized by insufficient or excessive production of THAP-family protein or 
production of THAP-family protein forms which have decreased or aberrant activity compared to 
THAP-family wild type protein. Moreover, the anti-THAP-family antibodies of the invention can 
be used to detect and isolate THAP-family proteins, regulate the bioavailability of THAP-family 
proteins, and modulate THAP-family activity. 

Accordingly one embodiment of the present invention involves a method of use (e.g., a 
diagnostic assay, prognostic assay, or a prophylactic/therapeutic method of treatment) wherein a 
molecule of the present invention (e.g., a THAP-family protein, THAP-family nucleic acid, or most 
preferably a THAP-family inhibitor or activator) is used, for example, to diagnose, prognose and/or 
treat a disease and/or condition in which any of the aforementioned THAP-family activities is 
indicated. In another embodiment, the present invention involves a method of use (e.g., a diagnostic 
assay, prognostic assay, or a prophylactic/therapeutic method of treatment) wherein a molecule of 
the present invention (e.g., a THAP-family protein, THAP-family nucleic acid, or a THAP-family 
inhibitor or activator) is used, for example, for the diagnosis, prognosis, and/or treatment of 
subjects, preferably a human subject, in which any of the aforementioned activities is pathologically 
perturbed. In a preferred embodiment, the methods of use (e.g., diagnostic assays, prognostic 
assays, or prophylactic/therapeutic methods of treatment) involve administering to a subject, 
preferably a human subject, a molecule of the present invention (e.g., a THAP-family protein, 
THAP-family nucleic acid, or a THAP-family inhibitor or activator) for the diagnosis, prognosis,' 
and/or therapeutic treatment. In another embodiment, the methods of use (e.g., diagnostic assays, 
prognostic assays, or prophylactic/therapeutic methods of treatment) involve administering to a 
human subject a molecule of the present invention (e.g., a THAP-family protein, THAP-family 
nucleic acid, or a THAP-family inhibitor or activator). 

For example, the invention encompasses a method of determining whether a THAP-family 
member is expressed within a biological sample comprising: a) contacting said biological sample 
with: ii) a polynucleotide that hybridizes under stringent conditions to a THAP-family nucleic acid; 
or iii) a detectable polypeptide (e.g. antibody) that selectively binds to a THAP-family polypeptide; 
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and b) detecting the presence or absence of hybridization between said polynucleotide and an RNA 
species within said sample, or the presence or absence of binding of said detectable polypeptide to a 
polypeptide within said sample. A detection of said hybridization or of said binding indicates that 
said THAP-family member is expressed within said sample. Preferably, the polynucleotide is a 
primer, and wherein said hybridization is detected by detecting the presence of an amplification 
product comprising said primer sequence, or the detectable polypeptide is an antibody. 

Also envisioned is a method of determining whether a mammal, preferably human, has an 
elevated or reduced level of expression of a THAP-family member, comprising: a) providing a 
biological sample from said mammal; and b) comparing the amount of a THAP-family polypeptide 
or of a THAP-family RNA species encoding a THAP-family polypeptide within said biological 
sample with a level detected in or expected from a control sample. An increased amount of said 
THAP-family polypeptide or said THAP-family RNA species within said biological sample 
compared to said level detected in or expected from said control sample indicates that said mammal 
has an elevated level of THAP-family expression, and a decreased amount of said THAP-family 
polypeptide or said THAP-family RNA species within said biological sample compared to said 
level detected in or expected from said control sample indicates that said mammal has a reduced 
level of expression of a THAP-family member. 

The present invention also pertains to the field of predictive medicine in which diagnostic 
assays, prognostic assays, and monitoring clinical trials are used for prognostic (predictive) 
purposes to thereby treat an individual prophylactically. Accordingly, one aspect of the present 
invention relates to diagnostic assays for determining THAP-family protein and/or, nucleic acid 
expression as well as THAP-family activity, in the context of a biological sample (e.g., blood, 
serum, cells, tissue) to thereby determine whether an individual is afflicted with a disease or 
disorder, or is at risk of developing a disorder, associated with aberrant THAP-family expression or 
activity. The invention also provides for prognostic (or predictive) assays for determining whether 
an individual is at risk of developing a disorder associated with a THAP-family protein, nucleic 
acid expression or activity. For example, mutations in a THAP-family gene can be assayed in a 
biological sample. Such assays can be used for prognostic or predictive purpose to thereby 
phophylactically treat an individual prior to the onset of a disorder characterized by or associated 
with a THAP-family protein, nucleic acid expression or activity. 

Accordingly, the methods of the present invention are applicable generally to diseases 
related to regulation of apoptosis, including but not limited to disorders characterized by unwanted 
cell proliferation or generally aberrant control of differentiation, for example neoplastic or 
hyperplastic disorders, as well as disorders related to proliferation or lack thereof of endothelial 
cells, inflammatory disorders and neurodegenerative disorders. 
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Diagnostic Assays 

An exemplary method for detecting the presence (quantitative or not) or absence of a 
THAP-family protein or nucleic acid in a biological sample involves obtaining a biological sample 
from a test subject and contacting the biological sample with a compound or an agent capable of 
detecting a THAP-family protein or nucleic acid (e.g., mRNA, genomic DNA) that encodes THAP- 
family protein such that the presence of the THAP-family protein or nucleic acid is detected in the 
biological sample. A preferred agent for detecting a THAP-family mRNA or genomic DNA is a 
labeled nucleic acid probe capable of hybridizing to a THAP-family mRNA or genomic DNA. The 
nucleic acid probe can be, for example, a full-length THAP-family nucleic acid, such as the nucleic 
acid of SEQ ID NO: 160 such as a nucleic acid of at least 15, 30, 50, 100, 250, 400, 500 or 1000 
nucleotides in length and sufficient to specifically hybridize under stringent conditions to a THAP- 
family mRNA or genomic DNA or a portion of a THAP-family nucleic acid. Other suitable probes 
for use in the diagnostic assays of the invention are described herein. 

In preferred embodiments, the subject method can be characterized by generally comprising 
detecting, in a tissue sample of the subject (e.g. a human patient), the presence or absence of a 
genetic lesion characterized by at least one of (i) a mutation of a gene encoding one of the subject 
THAP-family proteins or (ii) the mis-expression of a THAP-family gene. To illustrate, such genetic 
lesions can be detected by ascertaining the existence of at least one of (i) a deletion of one or more 
nucleotides from a THAP-family gene, (ii) an addition of one or more nucleotides to such a THAP- 
family gene, (iii) a substitution of one or more nucleotides of a THAP-family gene, (iv) a gross 
chromosomal rearrangement or amplification of a THAP-family gene, (v) a gross alteration in the 
level of a messenger RNA transcript of a THAP-family gene, (vi) aberrant modification of a THAP- 
family gene, such as of the methylation pattern of the genomic DNA, (vii) the presence of a non- 
wild type splicing pattern of a messenger RNA transcript of a THAP-family gene, and (viii) a non- 
wild type level of a THAP-family -target protein. 

A preferred agent for detecting a THAP-family protein is an antibody capable of binding to 
a THAP-family protein, preferably an antibody with a detectable label. Antibodies can be 
polyclonal, or more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or 
F(ab') 2 ) can be used. The term "labeled", with regard to the probe or antibody, is intended to 
encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable 
substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity 
with another reagent that is directly labeled. Examples of indirect labeling include detection of a 
primary antibody using a fluorescently labeled secondary antibody and end-labeling of a DNA 
probe with biotin such that it can be detected with fluorescently labeled streptavidin. The term 
"biological sample" is intended to include tissues, cells and biological fluids isolated from a subject, 
as well as tissues, cells and fluids present within a subject That is, the detection method of the 
invention can be used to detect a THAP-family mRNA, protein, or genomic DNA in a biological 

-137- 



WO 03/051917 PCT/EP02/14027 

sample in vitro as well as in vivo. For example, in vitro techniques for detection of a THAP-family 
mRNA include Northern hybridizations and in situ hybridizations. In vitro techniques for detection 
of a THAP-family protein include enzyme linked immunosorbent assays (ELISAs), Western blots, 
immunoprecipitations and immunofluorescence. In vitro techniques for detection of a THAP-family 
genomic DNA include Southern hybridizations. Furthermore, in vivo techniques for detection of a 
THAP-family protein include introducing into a subject a labeled anti-THAP-family antibody. For 
example, the antibody can be labeled with a radioactive marker whose presence and location in a 
subject can be detected by standard imaging techniques. 

In yet another exemplary embodiment, aberrant methylation patterns of a THAP-family 
gene can be detected by digesting genomic DNA from a patient sample with one or more restriction 
endonucleases that are sensitive to methylation and for which recognition sites exist in the THAP- 
family gene (including in the flanking and intronic sequences). See, for example, Buiting et ai. 
(1994) Human Mol Genet 3:893-895. Digested DNA is separated by gel electrophoresis, and 
hybridized with probes derived from, for example, genomic or cDNA sequences. The methylation 
status of the THAP-family gene can be determined by comparison of the restriction pattern 
generated from the sample DNA with that for a standard of known methylation. 

Furthermore, gene constructs such as those described herein can be utilized in diagnostic 
assays to determine if a cell's growth or differentiation state is NO longer dependent on the 
regulatory function of a THAP-family protein, e.g. in determining the phenotype of a transformed 
cell. Such knowledge can have both prognostic and therapeutic benefits. To illustrate, a sample of 
cells from the tissue can be obtained from a patient and dispersed in appropriate cell culture media, 
a portion of the cells in the sample can be caused to express a recombinant THAP-family protein or 
a THAP-family target protein, e.g. by transfection with a expression vector described herein, or to 
increase the expression or activity of an endogenous THAP-family protein or THAP-family target 
protein, and subsequent growth of the cells assessed. The absence of a change in phenotype of the 
cells despite expression of the THAP-family or THAP-family target protein may be indicative of a 
lack of dependence on cell regulatory pathways which includes the THAP-family or THAP-family 
target protein, e.g. THAP-family- or THAP-family target-mediated transcription. Depending on the 
nature of the tissue of interest, the sample can be in the form of cells isolated from, for example, a 
blood sample, an exfoliated cell sample, a fine needle aspirant sample, or a biopsied tissue sample. 
Where the initial sample is a solid mass, the tissue sample can be minced or otherwise dispersed so 
that cells can be cultured, as is known in the art. 

In yet another embodiment, a diagnostic assay is provided which detects the ability of a 
THAP-family gene product, e.g., isolated from a biopsied cell, to bind to other cellular proteins. For 
instance, it will be desirable to detect THAP-family mutants which, while expressed at appreciable 
levels in the cell, are defective at binding a THAP-family target protein (having either diminished 
or enhanced binding affinity). Such mutants may arise, for example, from mutations, e.g., point 
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mutants, which may be impractical to detect by the diagnostic DNA sequencing techniques or by 
the immunoassays described above. The present invention accordingly further contemplates 
diagnostic screening assays which generally comprise cloning one or more THAP-family genes 
from the sample cells, and expressing the cloned genes under conditions which permit detection of 
an interaction between that recombinant gene product and a target protein, e.g., for example the 
THAP1 gene and a target PAR4 protein or a PML-NB protein. As will be apparent from the 
description of the various drug screening assays set forth below, a wide variety of techniques can be 
used to determine the ability of a THAP-family protein to bind to other cellular components. These 
techniques can be used to detect mutations in a THAP-family gene which give rise to mutant 
proteins with a higher or lower binding affinity for a THAP-family target protein relative to the 
wild-type THAP-family. Conversely, by switching which of the THAP-family target protein and 
THAP-family protein is the "bait" and which is derived from the patient sample, the subject assay 
can also be used to detect THAP-family target protein mutants which have a higher or lower 
binding affinity for a THAP-family protein relative to a wild type form of that THAP-family target 
protein. 

In an exemplary embodiment, a PAR4 or a PMB-NB protein (e.g. wild-type) can be 
provided as an immobilized protein (a "target"), such as by use of GST fusion proteins and 
glutathione treated microtitre plates. A THAP1 gene (a "sample" gene) is amplified from cells of a 
patient sample, e.g., by PCR, ligated into an expression vector, and transformed into an appropriate 
host cell. The recombinant^ produced THAP1 protein is then contacted with the immobilized 
PAR4 or PMB-NB protein, e.g., as a lysate or a semi-purified preparation, the complex washed, and 
the amount of PAR4 or PMB-NB protein /THAP1 complex determined and compared to a level of 
wild-type complex formed in a control. Detection can be by, for instance, an immunoassay using 
antibodies against the wild-type form of the THAP1 protein, or by virtue of a label provided by 
cloning the sample THAP1 gene into a vector which provides the protein as a fusion protein 
including a detectable tag. For example, a myc epitope can be provided as part of a fusion protein 
with the sample THAP1 gene. Such fusion proteins can, in addition to providing a detectable label, 
also permit purification of the sample THAP1 protein from the lysate prior to application to the 
immobilized target. In yet another embodiment of the subject screening assay, the two hybrid 
assay, described in the appended examples, can be used to detect mutations in either a THAP- 
family gene or THAP-family target gene which alter complex formation between those two 
proteins. 

Accordingly, the present invention provides a convenient method for detecting mutants of 
THAP-family genes encoding proteins which are unable to physically interact with a THAP-family 
target "bait" protein, which method relies on detecting the reconstitution of a transcriptional 
activator in a THAP-family/THAP-family target-dependent fashion. 



-139- 



03051 91 7A2J_> 



WO 03/051917 



PCT/EP02/14027 



In one embodiment, the biological sample contains protein molecules from the test subject 
Alternatively, the biological sample can contain mRNA molecules from the test subject or genomic 
DNA molecules from the test subject A preferred biological sample is a serum sample isolated by 
conventional means from a subject. In another embodiment, the methods further involve obtaining 
a control biological sample from a control subject, contacting the control sample with a compound 
or agent capable of detecting a THAP-family protein, mRNA, or genomic DNA, such that the 
presence of a THAP-family protein, mRNA or genomic DNA is detected in the biological sample, 
and comparing the presence of a THAP-family protein, mRNA or genomic DNA in the control 
sample with the presence of a THAP-family protein, mRNA or genomic DNA in the test sample. 
The invention also encompasses kits for detecting the presence of THAP-family protein, mRNA or 
genomic DNA in a biological sample. For example, the kit can comprise a labeled compound or 
agent capable of detecting a THAP-family protein or mRNA or genomic DNA in a biological 
sample; means for determining the amount of a THAP-family member in the sample; and means for 
comparing the amount of THAP-family member in the sample with a standard. The compound or 
agent can be packaged in a suitable container. The kit can further comprise instructions for using 
the kit to detect THAP-family protein or nucleic acid. 

In certain embodiments, detection involves the use of a probe/primer in a polymerase chain 
reaction (PCR) (see, e.g., U.S. Pat. Nos. 4,683,195 and 4,683,202), such as anchor PGR or RACE 
PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g., Landegren et al. (1988) Science 
241:1077-1080; and Nakazawa et al. (1994) PNAS 91:360-364), the latter of which can be 
particularly useful for detecting point mutations in the THAP-family-gene (see Abravaya et al. 
(1995) Nucleic Acids Res. 23:675-682). This method can include the steps of collecting a sample of 
cells from a patient, isolating nucleic acid (e.g., genomic, mRNA or both) from the cells of the 
sample, contacting the nucleic acid sample with one or more primers which specifically hybridize to 
a THAP-family gene under conditions such that hybridization and amplification of the THAP- 
family-gene (if present) occurs, and detecting the presence or absence of an amplification product, 
or detecting the size of the amplification product and comparing the length to a control sample. It is 
anticipated that PCR and/or LCR may be desirable to use as a preliminary amplification step in 
conjunction with any of the techniques used for detecting mutations described herein. 

Genotyping assays for diagnostics generally require the previous amplification of the DNA 
region carrying the biallelic marker of interest. However, ultrasensitive detection methods which 
do not require amplification are also available. Methods well-known to those skilled in the art that 
can be used to detect biallelic polymorphisms include methods such as, conventional dot blot 
analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et al,, 
PNAS 86 : 2766-2770 (1989), denaturing gradient gel electrophoresis (DGGE), heteroduple* 
analysis, mismatch cleavage detection, and other conventional techniques as described in Sheffield 
et al. (1991), White et al. (1992), and Grompe et al. (1989 and 1993) (Sheffield, V.C. et al, Proc. 
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Natl. Acad. Sci. U.S.A 49:699-706 (1991); White, M.B. et aL, Genomics 12:301-306 (1992); 
Grompe, M. et al., Proc. Natl. Acad. Sci. U.S.A 86:5855-5892 (1989); and Grompe, M. Nature 
Genetics 5:11 1-1 17 (1993)). Another method for determining the identity of the nucleotide present 
at a particular polymorphic site employs a specialized exonuclease-resistant nucleotide derivative as 
described in U.S. patent 4,656,127. Further methods are described as follows. 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 
described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
DNA And Identification Of Single Nucleotide Polymorphisms". Preferably, the amplified DNA is 
subjected to automated dideoxy terminator sequencing reactions using a dye-primer cycle 
sequencing protocol. Sequence analysis allows the identification of the base present at the biallelic 
marker site. 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one 
single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. Typically, 
microsequencing reactions are carried out using fluorescent ddNTPs and the extended 
microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing machines to 
determine the identity of the incorporated nucleotide as described in EP 412 883. Alternatively 
capillary electrophoresis can be used in order to process a higher number of assays simultaneously. 
Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous phase 
detection method based on fluorescence resonance energy transfer has been described by Chen and 
Kwok (1997) and,Chen and Kwok (Nucleic Acids Research 25:347-353 1997) and Chen et al. 
(Proc. Natl. Acad. Sci. USA 94/20 10756-10761,1997)). In this method, amplified genomic DNA 
fragments containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in the 
presence of allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq 
polymerase. The dye-labeled primer is extended one base by the dye-terminator specific for the 
allele present on the template. At the end of the genotyping reaction, the fluorescence intensities of 
the two dyes in the reaction mixture are analyzed directly without separation or purification. All 
these steps can be performed in the same tube and the fluorescence changes can be monitored in 
real time. Alternatively, the extended primer may be analyzed by MALDI-TOF Mass 
Spectrometry. The base at the polymorphic site is identified by the mass added onto the 
microsequencing primer (see Haff and Smirnov, 1997, Genome Research, 7:378-388, 1997). In 
another example, Pastinen et aL. (Genome Research 7:606-614, 1997)) describe a method for 
multiplex detection of single nucleotide polymorphism in which the solid phase minisequencing 
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principle is applied to an oligonucleotide array format. High-density arrays of DNA probes 
attached to a solid support (DNA chips) are further described below. 

Other assays include mismatch detection assays, based on the specificity of polymerases 
and ligases. Polymerization reactions places particularly stringent requirements on correct base 
pairing of the 3' end of the amplification primer and the joining of two oligonucleotides hybridized 
to a target DNA sequence is quite sensitive to mismatches close to the ligation site, especially at the 
3' end. 

A preferred method of determining the identity of the nucleotide present at an allele 
involves nucleic acid hybridization. Any hybridization assay may be used including Southern 
hybridization, Northern hybridization, dot blot hybridization and solid-phase hybridization (see 
Sambrook et al., Molecular Cloning - A Laboratory Manual, Second Edition, Cold Spring Harbor 
Press, N.Y., 1989)). Hybridization refers to the formation of a duplex structure by two single 
stranded nucleic acids due to complementary base pairing. Hybridization can occur between 
exactly complementary nucleic acid strands or between nucleic acid strands that contain minor 
regions of mismatch. Specific probes can be designed that hybridize to one form of a biallelic 
marker and not to the other and therefore are able to discriminate between different allelic forms. 
Allele-specific probes are often used in pairs, one member of a pair showing perfect match to a 
target sequence containing the original allele and the other showing a perfect match to the target 
sequence containing the alternative allele. Hybridization conditions should be sufficiently stringent 
that there is a significant difference in hybridization intensity between alleles, and preferably an 
essentially binary response, whereby a probe hybridizes to only one of the alleles. Stringent, 
sequence specific hybridization conditions, under which a probe will hybridize only to the exactly 
complementary target sequence are well known in the art (Sambrook et al., 1989). The detection of 
hybrid duplexes can be carried out by a number of methods. Various detection assay formats are 
well known which utilize detectable labels bound to either the target or the probe to enable 
detection of the hybrid duplexes. Typically, hybridization duplexes are separated from 
unhybridized nucleic acids and the labels bound to the duplexes are then detected. Further, standard 
heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the 
primers and probes, (see Landegren U. et al., Genome Research, 8:769-776,1998). 

Hybridization assays based on oligonucleotide arrays rely on the differences in 
hybridization stability of short oligonucleotides to perfectly matched and mismatched target 
sequence variants. Efficient access to polymorphism information is obtained through a basic 
structure comprising high-density arrays of oligonucleotide probes attached to a solid support (e.g., 
the chip) at selected positions. Chips of various formats for use in detecting biallelic polymorphisms 
can be produced on a customized basis by Affymetrix (GeneChip), Hyseq (HyChip and 
HyGnostics), and Protogene Laboratories. 



-142- 



BNSDOCID: <WO 03051 91 7A2_I_> 



W ° 03/051917 PCT/EP02/14027 

In general, these methods employ arrays of oligonucleotide probes that are complementary 
to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280, describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms, further described in PCT application No. WO 95/1 1995. Upon completion of 
hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present 
in the sample. Hybridization and scanning may be carried out as described in PCT application No. 
WO 92/10092 and WO 95/1 1995 and US patent No. 5,424,186. Solid supports and polynucleotides 
of the present invention attached to solid supports are further described in "Oligonucleotide Probes 
And Primers". 

Having generally described this invention, a further understanding can be obtained by 
reference to certain specific examples which are provided herein for purposes of illustration only, 
and are not intended to be limiting unless otherwise specified. 

EXAMPLES 



EXAMPLE 1 

Isolation of the THA P1 cDNA in a two-hvbrid screen with chemokine SLC/CCL2 1 
In an effort to define the function of novel HEVEC proteins and the cellular pathways 
involved, we used different baits to screen a two-hybrid cDNA library generated from 
microvascular human HEV endothelial cells (HEVEC). HEVEC were purified from human tonsils 
by immunomagnetic selection with monoclonal antibody MECA-79 as previously described (Girard 
and Springer (1995) Immunity 2:113-123). The SMART PCR cDNA library Construction Kit 
(Clontech, Palo Alto, CA USA) was first used to generate full-length cDNAs from 1 \ig HEVEC 
total RNA. Oligo-dT-primed HEVEC cDNA were then digested with Sfil and directionally cloned 
into pGAD424-Sfi, a two-hybrid vector generated by inserting a Sfil linker (5'- 
GAATTCGGCCATTATGGCCTGCAGGATCC GGCCGCCTCGGCC CArraATrr^'^ (SEQ ID 
NO: 181) between EcoRI and Bamffl cloning sites of pGAD424 (Clontech). The resulting 
pGAD424-HEVEC cDNA two-hybrid library (mean insert size > 1 kb, - 3xl0 6 independant clones) 
was amplified in E. coll. To identify potential protein partners of chemokine SLC/6Ckine, 
screening of the two-hybrid HEVEC cDNA library was performed using as bait a cDNA encoding 
the mature form of human SLC/CCL21 (amino acids 24-134, GenBank Accession No: NP_002980, 
SEQ ID NO: 182), amplified by PCR from HEVEC RNA with primers hSLC.5' (5'- 
GCGGGATCCGTAGTGATGGAGGGGCTCAGGACTGTTG-3') (SEQ ID NO: 183) and 
hSLC.3' (5 ' -GCGGGATCCCTATGGCCCTTT AGGGGTCTGTG ACC-3 ') (SEQ ID NO: 184), 
digested with BamHI and inserted into the BamHI cloning site of MATCHMAKER two-hybrid 
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system 2 vector pGBT9 (Clontech). Briefly, pGBT9-SLC was cotransformed with the pGAD424- 
HEVEC cDNA library in yeast strain Y190 (Clontech). L5xl0 7 yeast trarisformants were screened 
and positive protein interactions were selected by His auxotrophy. The plates were incubated at 
30°C for 5 days. Plasmid DNA was extracted from positive colonies and used to verify the 
specificity of the interaction by cotransformation in AH109 with pGBT9-SLC or control baits 
pGBT9, pGBT9-lamin. Eight independent clones isolated in this two-hybrid screen were 
characterized. They were found to correspond to a unique human cDNA encoding a novel human 
protein of 213 amino acids, designated THAP1, that exhibits 93% identity with its mouse 
orthologue (Figure 1A). The only noticeable motifs in the THAP1 predicted protein sequence were 
a short proline-rich domain in the middle part and a consensus nuclear localization sequence (NLS) 
in the carboxy terminal part (Figure IB). Databases searches with the THAP1 sequence failed to 
reveal any significant similarity to previously characterized proteins with the exception of the first 
90 amino acids that may define a novel protein motif associated with apoptosis, hereafter referred to 
as THAP domain (see Figure IB, Figures 9A-9C, and Figure 10). 

EXAMPLE 2 

Northern Blot 

To determine the tissue distribution of THAP1 mRNA, we performed Northern blot 
analysis of 12 different adult human tissues (Fig 2). Multiple Human Tissues Northern Blots 
(CLONTECH) were hydridized according to manufacturer's instructions. The probe was a PCR 
product corresponding to the THAP1 ORF, 32 P-labeled with the Prime-a-Gene Labeling System 
(PROMEGA).A 2.2-kb mRNA band was detected in brain, heart, skeletal muscle, kidney, liver, and 
placenta. In addition to the major 2.2 kb band, lower molecular weight bands were detected, that are 
likely to correspond to alternative splicing or polyadenylation of the THAP1 pre-mRNA. The 
presence of THAP1 mRNAs in many different tissues suggests that THAP1 has a widespread, 
although not ubiquitous, tissue distribution in the human body. 

EXAMPLE 3 
Analysis of the subcellular THAP 1 localization 
To analyze the subcellular localization of the THAP1 protein, the THAP1 cDNA was fused 
to the coding sequence of GFP (Green Fluorescent Protein). The full-length coding region of 
THAP1 was amplified by PCR from HEVEC cDNA with primers 2HMR10 (5'- 
CCGAATTCAGGATGGTGCAGTCCTGCTCCGCCT-3') (SEQ ID NO: 185) and 2HMR9 (¥- 
CGCGGATCCTGCTGGTACTTCAACTATTTCAAAGTAGTC-3') (SEQ ID NO: 186), digested 
with EcoRI and BamHI, and cloned in frame downstream of the Enhanced Green Fluorescent 
Protein (EGFP) ORF in pEGFP.C2 vector (Clontech) to generate pEGFP.C2-THAPl. The 
GFP/THAP1 expression construct was then transfected into human primary endothelial cells from 
umbilical vein (HUVEC, PromoCell, Heidelberg, Germany). HUVEC were grown in complete 
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ECGM medium (PromoCell, Heidelberg, Germany), plated on coverslips and transiently transfected 
in RPMI medium using GeneJammer transfection reagent according to manufacturer instructions 
(Stratagene, La Jolla, CA, USA). Analysis by fluorescence microscopy 24h later revealed that the 
GFP/THAP1 fusion protein localizes exclusively in the nucleus with both a diffuse distribution and 
an accumulation into speckles while GFP alone exhibits only a diffuse staining over the entire cell. 
To investigate the identity of the speckled domains with which GFP/THAP1 associates, we used 
indirect immunofluorescence microscopy to examine a possible colocalization of the nuclear dots 
containing GFP/THAP1 with known nuclear domains (replication factories, splicing centers, 
nuclear bodies). 

Cells transfected with GFP-tagged expression constructs were allowed to grow for 24 h to 
48 h on coverslips. Cells were washed twice with PBS, fixed for 15 min at room temperature in 
PBS containing 3.7% formaldehyde, and washed again with PBS prior to neutralization with 50mM 
NH4CI in PBS for 5 min at room temperature. Following one more PBS wash, cells were 
permeabilized 5 min at room temperature in PBS containing 0.1% Triton-XlOO, and washed again 
with PBS. Permeabilized cells were then blocked with PBS-BSA (PBS with 1% bovine serum 
albumin) for 10' and then incubated 2 hr at room temperature with the following primary antibodies 
diluted in PBS-BSA: rabbit polyclonal antibodies against human Daxx (1/50, M-112, Santa Cruz 
Biotechnology) or mouse monoclonal antibodies anti-PML (mouse IgGl, 1/30, mAb PG-M3 from 
Dako, Glostrup, Denmark). Cells were then washed three times 5 min at room temperature in PBS- 
BSA, and incubated for 1 hr with Cy3 (red fluorescence)-conjugated goat anti-mouse or anti-rabbit 
IgG (1/1000, Amersham Pharmacia Biotech) secondary antibodies, diluted in PBS-BSA. After 
extensive washing in PBS, samples were air dried and mounted in Mowiol. Images were collected 
on a Leica confocal laser scanning microscope. The GFP (green) and Cy3 (red) fluorescence signals 
were recorded sequentially for identical image fields to avoid cross-talk between the channels. 

This analysis revealed that GFP-THAP1 staining exhibits a complete overlap with the 
staining pattern obtained with antibodies directed against PML. The colocalization of GFP/THAP1 
and PML was observed both in nuclei with few PML-NBs (less than ten) and in nuclei with a large 
number of PML-NBs. Indirect immunofluorescence staining with antibodies directed against Daxx, 
another well characterized component of PML-NBs, was performed to confirm the association of 
GFP/THAP1 with PML-NBs. We found a complete colocalization of GFP/THAP1 and Daxx in 
PML-NBs. Together, these results reveal that THAP1 is a novel protein associated with PML-NBs. 

EXAMPLE 4 

Identification of prote ins interacting with THAP1 in human HEVECs: two-hvbrid assay 
THAP1 forms a complex with the pro-apoptotic protein PAR4 

To identify potential protein- partners of THAP1, screening of the two-hybrid HEVEC 
cDNA library was performed using as a bait the human THAP1 full length cDNA inserted into the 
MATCHMAKER two-hybrid system 3 vector pGBKT7 (Clontech). Briefly, the full-length coding 
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region of THAP1 was amplified by PCR from HEVEC cDNA with primers 2HMRJ0 (5'- 
CCGAATTCAGGATGGTGCAGTCCTGCTCCGCCT-3') (SEQ ID NO: 187) and 2HMR9 (5'- 
CGCGGATCCTGCTGGTACTTCAACTATTTCAAAGTAGTC-3') (SEQ ID NO: 188), digested 
with EcoRI and BamHI, and cloned in frame downstream of the Gal4 Binding Domain (Gal4-BD) 
in pGBKT7 vector to generate pGBKT7-THAPl . pGBKT7-THAP 1 was then cotransformed with 
the pGAD424-HEVEC cDNA library in yeast strain AH109 (Clontech). 1.5xl0 7 yeast transformants 
were screened and positive protein interactions were selected by His and Ade double auxotrophy 
according to manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). The 
plates were incubated at 30°C for 5 days. Plasmid DNA was extracted from these positive colonies 
and used to verify the specificity of the interaction by cotransformation in AH109 with pGBKT7- 
THAP1 or control baits pGBKT7, pGBKT7-lamin and pGBKT7-hevin. Three clones which 
specifically interacted with THAP1 were obtained in the screen ; sequencing of these clones 
revealed three identical library plasmids that corresponded to a partial cDNA coding for the last 147 
amino acids (positions 193-342) of the human pro-apoptotic protein PAR4 (Fig 3A). Positive 
interaction between THAP1 and Part was confirmed using full length Par4 bait (pGBKT-Par4) and 
prey (pGADT7-Par4). Full-length human Par4 was amplified by PCR from human thymus cDNA 
(Clontech), with primers Par4.8 (5'-GCGGAATTCATGGCGACCGGTGGCTACCGGACC-3 > ) 
(SEQ ID NO: 189) and Par4.5 (5'-GCGGGATCCCTCTACCTGGTCAGCTGACCCACAAC-3') 
(SEQ ID NO: 190), digested with EcoRI and BamHI, and cloned in pGBKT7 and pGADT7 vectors, 
to generate pGBKT7-Par4 and pGADT7-Par4. Positive interaction between THAP1 and Par4 was 
confirmed by cotransformation of AH109 with pGBKT7-THAP 1 and pGADT7-Par4 or pGBKT7- 
Par4 and pGADT7-THAP 1 and selection of transformants by His and Ade double auxotrophy 
according to manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). To 
generate pGADT7-THAP 1 , the fiilHength coding region of THAP1 was amplified by PCR from 
HEVEC cDNA with primers 2HMR10 (5'-CCGAATTCAGGATGGTGCAGTCCTGCTCCGCCT- 
3') (SEQ ID NO: 191) and 2HMR9 (5'- 

CGCGGATCCTGCTGGT ACTTC AACTATTTC AAAGT AGTC-3 *) (SEQ ID NO: 192), digested 
with EcoRI and BamHI, and cloned in frame downstream of the Gal-4 Activation Domain (Gal4- 
AD) in pGADT7 two-hybrid vector (Clontech). 

We then examined whether the leucine zipper/death domain at the C-terminus of Par4, 
previously shown to be involved in Par4 binding to WT-1 and aPKC, was required for the 
interaction between THAP1 and Par4. Two Par4 mutants were constructed for that purpose, Par44 
and Par4DD. Par4A lacks the leucine zipper/death domain while Par4DD contains this domain 
pGBKT7-Par4A(amino acids 1-276) and pGADT7-Par4A. were constructed by sub-cloning a 
EcoRI-BglH fragment from pGADT7-Par4 into the EcoRI and BamHI sites of pGBKT7 and 
pGADT7. Par4DD (amino acids 250-342) was amplified by PCR, using pGBKT7-Par4 as 
template, with primers Par4A (5'- 
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CGCGAATTCGCCATCATGGGGTTCCCTAGATATAACAGGGATGCAA-3') (SEQ ID NO: 
193) and Par4.5, and cloned into the EcoRI and BamHI sites of pGBKT7 and pGADT7 to obtain 
pGBKT7-Par4DD and pGADT7-Par4DD. Two-hybrid interaction between THAP1 and Par4 
mutants was tested by cotransformation of AH109 with pGBKT7-THAP 1 and pGADT7-Par4A or 
pGADT7-Par4DD and selection of transformants by His and Ade double auxotrophy according to 
manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). We found that the 
Par4 leucine zipper/death domain (Par4DD) is not only required but also sufficient for the 
interaction with THAP1 (Fig 3 A). Similar results were obtained when two-hybrid experiments were 
performed in the opposite orientation using Par4 or Part mutants (Par4A and Par4DD) as baits 
instead of THAP1 (Fig 3 A). 

EXAMPLE 5 
In vitro THAP1/Par4 interaction assay 
To confirm the interaction observed in yeast, we performed in vitro GST pull down assays. 
Par4DD, expressed as a GST-tagged fusion protein and immobilized on glutathione sepharose, was 
incubated with radiolabeled in vitro translated THAP1. To generate the GST-Par4DD expression 
vector, Par4DD (amino acids 250-342) was amplified by PCR with primers Part. 10 (5'- 
GCCGG ATCCGGGTTCCCTAGATATAAC AGGGATGC AA-3 ' ) (SEQ ID NO: 194) and Par4.5 y 
and cloned in frame downstream of the Glutathion S-Transferase ORF, into the BamHI site of the 
pGEX-2T prokaryotic expression vector (Amersham Pharmacia Biotech, Saclay, France). GST- 
Par4DD(amino acids 250-342) fusion protein encoded by plasmid pGEX-2T-Par4DD and control 
GST protein encoded by plasmid pGEX-2T, were then expressed in E.Coli DH5oc and purified by 
affinity chromatography with glutathione sepharose according to supplier's instructions (Amersham 
Pharmacia Biotech). The yield of proteins used in GST pull-down assays was determined by SDS- 
Polyarylamide Gel Electrophoresis (PAGE) and Coomassie blue staining analysis. In vitro- 
translated THAP1 was generated with the TNT-coupled reticulocyte lysate system (Promega, 
Madison, WI, USA) using pGBKT7-THAP 1 vector as template. 25 jjlI of 35 S-labelled wild-type 
THAP1 was incubated with immobilized GST-Par4 or GST proteins overnight at 4 °C, in the 
following binding buffer : 10 mM NaP04 pH 8.0, 140 mM NaCl, 3 mM MgC12, ImM 
dithiothreitol (DTT), 0.05% NP40, and 0.2 mM phenylmethyl sulphonyl fluoride (PMSF), 1 mM 
Na Vanadate, 50mM p Glycerophosphate, 25 jug/ml chimotrypsine, 5 jig/ml aprotinin, 10 |ig/ml 
Leupeptin. Beads were then washed 5 times in 1 ml binding buffer. Bound proteins were eluted 
with 2X Laemmli SDS-PAGE sample buffer, fractionated by 10% SDS-PAGE and visualized by 
fluorography using Amptify (Amersham Pharmacia Biotech). As expected, GST/Par4DD interacted 
with THAP1 (Fig 3B). In contrast, THAP1 failed to interact with GST beads. 
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EXAMPLE 6 
In vivo THAP1/Par4 interaction assay 
To provide further evidence for a physiological interaction between THAP1 and Par4 in 
vivo interactions between THAP1 and PAR4 were investigated. For that purpose, confocal 
immunofluorescence microscopy was used to analyze the subcellular localization of epitope-tagged 
Par4DD in primary human endothelial cells transiently cotransfected with pEF-/rcycPar4DD 
eukaryotic expression vector and GFP or GFP-THAP1 expression vectors (pEGFP.C2 and 
pEGFP.C2-THAPl, respectively). To generate pEF-mycPar4DD, mycPar4DD (amino acids 250- 
342) was amplified by PCR using pGBKT7-Par4DD as template, with primers myc.BD7 (5*- 
GCGCTCTAGAGCCATCATGGAGGAGCAGAAGCTGATC-3 7 ) (SEQ ID NO: 195) and Par4.9 
(5 ' -CTTGCGGCCGCCTCTACCTGGTC AGCTGACCC AC AAC-3 ' ) (SEQ ID NO: 196), and 
cloned into the Xbal and NotI sites of the pEF-BOS expression vector (Mizushima and Nagata, 
Nucleic Acids Research, 18:5322, 1990). Primary human endothelial cells from umbilical vein 
(HUVEC, PromoCell, Heidelberg, Germany) were grown in complete ECGM medium (PromoCell, 
Heidelberg, Germany), plated on coverslips and transiently transfected in RPMI medium using 
GeneJammer transfection reagent according to manufacturer instructions (Stratagene, La Jolla, CA, 
USA). Cells co-transfected with pEF-/wycPar4DD and GFP-tagged expression constructs were 
allowed to grow for 24 h to 48 h on coverslips. Cells were washed twice with PBS, fixed for 15 
min at room temperature in PBS containing 3.7% formaldehyde, and washed again with PBS prior 
to neutralization with 50mM NH4CI in PBS for 5 min at room temperature. Following one more 
PBS wash, cells were permeabilized 5 min at room temperature in PBS containing 0.1% Triton- 
X100, and washed again with PBS. Permeabilized cells were then blocked with PBS-BSA (PBS 
with 1% bovine serum albumin) for 10' and then incubated 2 hr at room temperature with mouse 
monoclonal antibody anti-myc epitope (mouse IgGl, 1/200, Clontech) diluted in PBS-BSA. Cells 
were then washed three times 5 min at room temperature in PBS-BSA, and incubated for 1 hr with 
Cy3 (red fluorescence)-conjugated goat anti-mouse (1/1000, Amersham Pharmacia Biotech) 
secondary antibodies, diluted in PBS-BSA. After extensive washing in PBS, samples were air dried 
and mounted in Mowiol. Images were collected on a Leica confocal laser scanning microscope. 
The GFP (green) and Cy3 (red) fluorescence signals were recorded sequentially for identical image 
fields to avoid cross-talk between the channels. 

In cells transiently co-transfected with pEF-m>/cPar4DD and GFP expression vector, 
ectopically expressed myc-Par4DD was found to accumulate both in the cytoplasm and the nucleus 
of the majority of the cells. In contrast, transient cotransfection of pEF-/nycPar4DD and GFP- 
THAP1 expression vectors dramatically shifted myc-Par4DD from a diffuse cytosolic and nuclear 
localization to a preferential association with PML-NBs. The effect of GFP-THAP1 on myo 
Par4DD localization was specific since it was not observed with GFP-APS kinase-1 (APSK-1), a 
nuclear enzyme unrelated to THAP1 and apoptosis [Besset et al., Faseb J, 14:345-354, 2000]. This 
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later result shows that GFP-THAP1 recruits myc-Par4DD at PML-NBs and provides in vivo 
evidence for a direct interaction of THAP1 with the pro-apoptotic protein Par4. 

EXAMPLE 7 

Identification of a novel arginine-rich Par4 binding motif 
To identify the sequences mediating THAP1 binding to Par4, a series of THAP1 deletion 
constructs was generated. Both amino-terminal (THAP1-C1, -C2, -C3) and carboxy-terminal 
(THAP1-N1, -N2, -N3) deletion mutants (Figure 4A) were amplified by PCR using plasmid 
pEGFP.C2-THAPl as a template and the following primers: 2HMR12 (5'- 
GCGGAATTCAAAGAAGATCTTCTGGAGCCACAGGAAC-3') (SEQ ID NO: 197) 

and 2HMR9 (5'-CGCGGATCCTGCTGGTACTTCAACTATTTCAAAGTAGTC-3 ') (SEQ 
ID NO: 198) for THAP1-C1 (amino acids 90-213); 

PAPM2 (5 ' -GCGGAATTC ATGCCGCCTCTTC AGACCCCTGTTAA-3 ') (SEQ ID NO: 

199) 

and 2HMR9 for THAP1-C2 (amino acids 120-213); 

PAPM3 (5 ' -GCGG AATTCATGC ACCAGCGGAAA AGGATTC ATC AG-3 ') (SEQ ID 
NO: 200) 

and 2HMR9 for THAP1-C3 (amino acids 143-213); 

2HMR10 (5 '-CCGAATTC AGGATGGTGC AGTCCTGCTCCGCCT-3 ' ) (SEQ ID NO: 

201) 

and 2HMR1 7 (5'-GCGGGATCCCTTGTCATGTGGCTCAGTACAAAGAAATAT-3') 
(SEQ ID NO: 202) for THAP1-N1 (amino acids 1-90); 

2HMR10 and PAPN2 (5'-CGGGATCCTGTGCGGTCTTGAGCTTCTTTCTGAG-3') 
(SEQ ID NO: 203) for THAP1-N2 (amino acids 1-166); and 

2HMR10 and PAPN3 (5'-GCGGGATCCGTCGTCTTrCTCTTTCTGGAAGTGAAC-3') 
(SEQ ID NO: 204) for THAP1-N3 (amino acids 1-192). 

The PCR fragments, thus obtained, were digested with EcoRI and BamHI, and cloned in 
frame downstream of the Gal4 Binding Domain (Gal4-BD) in pGBKT7 two-hybrid vector 
(Clontech) to generate pGBKT7-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3, or downstream of the 
Enhanced Green Fluorescent Protein (EGFP) ORF in pEGFP.C2 vector (Clontech) to generate 
pEGFP.C2-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3. 

Two-hybrid interaction between THAP1 mutants and Par4DD was tested by 
cotransformation of AH109 with pGBKT7-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3 and pGADT7- 
Par4DD and selection of transformants by His and Ade double auxotrophy according to 
manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). Positive two-hybrid 
interaction with Par4DD was observed with mutants THAP1-C1, -C2, -C3, -and -N3 but not with 
mutants THAP1-N1 and -N2, suggesting the Par4 binding site is found between THAPl residues 
143 and 192. 
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THAP1 mutants were also tested in the in vitro THAP1/Par4 interaction assay. In vitro- 
translated THAP1 mutants were generated with the TNT-coupled reticulocyte lysate system 
(Promega, Madison, WI, USA) using pGBKT7-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3 vector as 
template. 25 \il of each 35 S-labelled THAP1 mutant was incubated with immobilized GST or GST- 
Part protein overnight at 4 °C, in the following binding buffer : 10 mM NaP04 pH 8.0, 140 mM 
NaCl, 3 mM MgC12, ImM dithiothreitol (DTT), 0.05% NP40, and 0.2 mM phenylmethyl 
sulphonyl fluoride (PMSF), 1 mM Na Vanadate, 50mM p Glycerophosphate, 25 |ag/ml 
chimotrypsine, 5 fig/ml aprotinin, 10 ^ig/ml Leupeptin. Beads were then washed 5 times in 1 ml 
binding buffer. Bound proteins were eluted with 2X Laemmli SDS-PAGE sample buffer, 
fractionated by 10% SDS-PAGE and visualized by fluorography using Amplify (Amersham 
Pharmacia Biotech). As expected, THAP1-C1, -C2, -C3, -and -N3 interacted with GST/Par4DD 
(Figure 4B). In contrast, THAP1-N1 and -N2 failed to interact with GST/Par4DD beads. 

Finally, Par4 binding activity of THAP1 mutants was also analyzed by the in vivo 
THAP1/Par4 interaction assay as described in Example 6 using pEF-wvcPar4DD and pEGFP.C2- 
THAP1-C1, -C2, -C3, -Nl, -N2 or -N3 expression vectors. 

Essentially identical results were obtained with the three THAP1/Par4 interactions assays 
(Figure 4 A). That is, the Par4 binding site was found between residues 143 and 192 of human 
THAP1- Comparison of this region with the Par4 binding domain of mouse ZIP kinase, another 
Par4-interacting protein, revealed the existence of a conserved arginine rich-sequence motif (SEQ 
ID NOs: 205, 263 and 15), that may correspond to the Par4 binding site (Figure 5A). Mutations in 
this arginine rich-sequence motif were generated by site directed mutagenesis. These two novel 
THAP1 mutants, THAP1 RR/AA (replacement of residues R171A and R172A) and 
THAP 1 AQRCRR (deletion of residues 168-172), were generated by two successive rounds of PCR 
using pEGFP.C2-THAPl as template and primers 2HMR10 and 2HMR9 together with primers 

RR/AA-1 (5 '-CCGC AC AGC AGCG ATGCGCTGCTC AAGAACGGC AGCTTG-3 ' ) (SEQ 
ID NO: 206) and 

RR'AA-2 (5 '-C AAGCTGCCGTTCTTGAGC AGCGC ATCGCTGCTGTGCGG-3 ') (SEQ 
ID NO: 207) for mutant THAP1 RR/AA or 

primers ARR-1 (5>-GCTCAAGACCGCACAGCAAGAACGGCAGCTTG-3'(SEQ ID NO: 

208) and 

ARR-2 (5 f -C AAGCTGCCGTTCTTGCTGTGCGGTCTTGAGC-3 ') (SEQ ID NO: 209) for 
mutant THAP 1 AQRCRR. The resulting PCR fragments were digested with EcoRI and BamHI, and 
cloned in frame downstream of the Gal4 Binding Domain (Gal4-BD) in pGBKT7 two-hybrid vector 
(Clontech) to generate pGBKT7-THAPl -RR/AA and -A(QRCRR), or downstream of the Enhanced 
Green Fluorescent Protein (EGFP) ORF in pEGFP.C2 vector (Clontech) to generate pEGFP.CZ- 
THAP1 -RR/AA and -A(QRCRR). THAP1 RR/AA and THAP 1 AQRCRR THAP1 mutants were 
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then tested in the three THAP1/Par4 interaction assays (two-hybrid assay, in vitro THAP1/Par4 
interaction assay, m vivo THAP1/Par4 interaction assay) as described above for the THAP1-C1, - 
C2, -C3, -Nl, -N2 or -N3 mutants. This analysis revealed that the two mutants were deficient for 
interaction with Par4 in all three assays (Figure 5B), indicating that the novel arginine-rich 
sequence motif, we have identified, is a novel Par4 binding motif. 

EXAMPLE 8 

PAR4 is a novel component of PML-NBs that colocalizes with THAP1 in vivn 
We then wished to determine if PAR4 colocalizes with THAP1 in vivo in order to provide 
further evidence for a physiological interaction between THAP1 and PAR4. We first analyzed Par4 
subcellular localization in primary human endothelial cells. Confocal immunofluorescence 
microscopy using affinity-purified anti-PAR4 antibodies (Sells et al., 1997 ; Guo et al ; 1998) was 
performed on HTJVEC endothelial cells fixed with methanol/acetone, which makes PML-NBs 
components accessible for antibodies (Sternsdorf et al., 1997). Cells were fixed in methanol for 5 
min at -20°C, followed by incubation in cold acetone at -20°C for 30 sec. Permeabilized cells were 
then blocked with PBS-BSA (PBS with 1% bovine serum albumin) for 10' and then incubated 2 hr 
at room temperature with rabbit polyclonal antibodies against human Par4 (1/50, R-334, Santa Cruz 
Biotechnology, Santa Cruz, CA, USA) and mouse monoclonal antibody anti-PML (mouse IgGl, 
1/30, mAb PG-M3 from Dako, Glostrup, Denmark). Cells were then washed three times 5 min at 
room temperature in PBS-BSA, and incubated for 1 hr with Cy3 (red fluorescence)-conjugated goat 
anti-rabbit IgG (1/1000, Amersham Pharmacia Biotech) and FITC-labeled goat anti-mouse-IgG 
(1/40, Zymed Laboratories Inc., San Francisco, CA, USA) secondary antibodies, diluted in PBS- 
BSA. After extensive washing in PBS, samples were air dried and mounted in Mowiol. Images 
were collected on a Leica confocal laser scanning microscope. The FITC (green) and Cy3 (red) 
fluorescence signals were recorded sequentially for identical image fields to avoid cross-talk 
between the channels. This analysis showed an association of PAR4 immunoreactivity with nuclear 
dot-like structures, in addition to diffuse nucleoplasms and cytoplasmic staining. Double 
immunostaining with anti-PML antibodies, revealed that the PAR4 foci colocalize perfectly with 
PML-NBs in cell nuclei. Colocalization of Par4 with GFP-THAP1 in PML-NBs was analyzed in 
transfected HUVEC cells expressing ectopic GFP-THAP1. HUVEC were grown in complete 
ECGM medium (PromoCell, Heidelberg, Germany), plated on coverslips and transiently transfected 
with GFP/THAP1 expression construct (pEGFP.C2-THAPl) in RPMI medium using GeneJammer 
transfection reagent according to manufacturer instructions (Stratagene, La Jolla. CA, USA). 
Analysis of transfected cells by indirect immunofluorescence microscopy 24h later, with anti-Par4 
rabbit antibodies, revealed that all endogenous PAR4 foci colocalize with ectopic GFP-THAPl in 
PML-NBs further confirming the association of the THAP1/PAR4 complex with PML-NBs in vivo. 
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EXAMPLE 9 

PML recruits the THAP1/PAR4 complex to PML-NBs 
Since it has been shown that PML plays a critical role in the assembly of PML-NBs by 
recruiting other components, we next wanted to determine whether PML plays a role in the 
recruitment of the THAP1/PAR4 complex to PML-NBs. For this purpose, we made use of the 
observation that both endogenous PAR4 and ectopic GFP-THAP1 do not accumulate in PML-NBs 
in human Hela cells. Expression vectors for GFP-THAP1 and HA-PML (or HA-SP100) were 
cotransfected into these cells and the localization of endogenous PAR4, GFP-THAP1 and HA-PML 
(or HA-SP100) was analyzed by triple staining confocal microscopy. 

Human Hela cells (ATCC) were grown in Dulbecco's Modified Eagle's Medium 
supplemented with 10% Fetal Calf Serum and 1% Penicillin-streptomycin (all from Life 
Technologies, Grand Island, NY, USA), plated on coverslips, and transiently transfected with 
calcium phosphate method using 2 |ig pEGFP.C2-THAPl and pcDNA.3-HA-PML3 or pSG5-HA- 
SplOO (a gift from Dr Dejean, Institut Pasteur, Paris, France) plasmid DNA. pcDNA.3-HA-PML3 
was constructed by sub-cloning a Bglll-BamHI fragment from pGADT7-HA-PML3 into the 
BamHI site of pcDNA3 expression vector (Invitrogen, San Diego, CA, USA). To generate 
pG ADT7-HA-PML3 , PML3 ORF was amplified by PCR, using pACT2-PML3 (a gift from Dr De 
The, Paris, France) as template, with primers 

PML-1 (5 3 -GC GGG ATCCCT AAATT AG AAAGGGGTGGGGGT AGCC-3 9 ) (SEQ ID NO: 
210) and 

PML-2 (5'-GCGGAATTCATGGAGCCTGCACCCGCCCGATC-3') (SEQ ID NO: 211), 
and cloned into the EcoRI and BamHI sites of pGADT7. 

Hela cells transfected with GFP-tagged and HA-tagged expression constructs were allowed 
to grow for 24 h to 48 h on coverslips. Cells were washed twice with PBS, fixed in methanol for 5 
min at -20°C, followed by incubation in cold acetone at ~20°C for 30 sec. Permeabilized cells were 
then blocked with PBS-BSA (PBS with 1% bovine serum albumin) for 10' and then incubated 2 hr 
at room temperature with the following primary antibodies diluted in PBS-BSA: rabbit polyclonal 
antibodies against human Par4 (1/50, R-334, Santa Cruz Biotechnology, Santa Cruz, CA, USA) and 
mouse monoclonal antibody anti-HA tag (mouse IgGl, 1/1000, mAb 16B12 from BabCO, 
Richmond, CA, USA). Cells were then washed three times 5 min at room temperature in PBS-BSA., 
and incubated for 1 hr with Cy3 (red fluorescence>conjugated goat anti-rabbit IgG (1/1 000, 
Amersham Pharmacia Biotech) and Alexa Fluor-633 (blue fluorescence) goat anti-mouse IgG 
conjugate (1/100, Molecular Probes, Eugene, OR, USA) secondary antibodies, diluted in PBS-BSA.. 
After extensive washing in PBS, samples were air dried and mounted in Mowiol. Images were 
collected on a Leica confocal laser scanning microscope. The GFP (green), Cy3 (red) and Alejca 
633 (blue) fluorescence signals were recorded sequentially for identical image fields to avoid cross- 
talk between the channels. 
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In Hela cells transfected with HA-PML, endogenous PAR4 and GFP-THAP1 were 
recruited to PML-NBs, whereas in cells transfected with HA-SP100, both PAR4 and GFP-THAP1 
exhibited diffuse staining without accumulation in PML-NBs. These findings indicate that 
recruitment of the THAP1/PAR4 complex to PML-NBs depends on PML but not SP100. 

EXAMPLE 10 
THAP1 is an apoptosis inducing polypeptide 
THAP1 is a novel proapoptotic factor 

Since PML and PML-NBs have been linked to regulation of cell death and PAR4 is a well 
established pro-apoptotic factor, we examined whether THAP1 can modulate cell survival. Mouse 
3T3 cells, which have previously been used to analyze the pro-apoptotic activity of PAR4 (Diaz- 
Meco et al , 1996 ; Berra et al., 1997), were transfected with expression vectors for GFP-THAP1, 
GFP-PAR4 and as a negative control GFP-APS kinase-1 (APSK-1), a nuclear enzyme unrelated to 
THAP1 and apoptosis (Girard et aL, 1998 ; Besset et al., 2000). We then determined whether 
ectopic expression of THAP1 enhances the apoptotic response to serum withdrawal. Transfected 
cells were deprived of serum for up to twenty four hours and cells with apoptotic nuclei, as revealed 
by DAPI staining and in situ TUNEL assay, were counted. 

Cell death assays: Mouse 3T3-TO fibroblasts were seeded on coverslips in 12-well plates 
at 40 to 50% confluency and transiently transfected with GFP or GFP-fusion protein expression 
vectors using Lipofectamine Plus reagent (Life Technologies) according to supplier's instructions. 
After 6h at 37°C, the DNA-lipid mixture was removed and the cells were allowed to recover in 
complete medium for 24 h. Serum starvation of transiently transfected cells was induced by 
changing the medium to 0% serum, and the amount of GFP-positive apoptotic cells was assessed 24 
h after induction of serum starvation. Cells were fixed in PBS containing 3.7% formaldehyde and 
permeabilized with 0.1% Triton-XlOO as described under immunofluorescence, and apoptosis was 
scored by in situ TUNEL (terminal deoxynucleotidyl transferase-mediated dUTP nick end labeling) 
and/or DAPI (4,6-Diamidino-2-phenylindole) staining of apoptotic nuclei exhibiting nuclear 
condensation. The TUNEL reaction was performed for 1 hr at 37°C using the in situ cell death 
detection kit, TMR red (Roche Diagnostics, Meylan, France). DAPI staining with a final 
concentration of 0.2 :g/ml was performed for 10 min at room temperature. At least 100 cells were 
scored for each experimental point using a fluorescence microscope. 

Basal levels of apoptosis in the presence of serum ranged from 1-3 %. Twenty four hours 
after serum withdrawal, apoptosis was found in 18% of untransfected 3T3 cells and in 3T3 cells 
overexpressing GFP-APSK-1. Levels of serum withdrawal induced apoptosis were significantly 
increased to about 70% and 65% in cells overexpressing GFP-PAR4 and GFP-THAP1, respectively 
(Figure 6A). These results demonstrate that THAP1, similarly to PAR4, is an apoptosis inducing 
polypeptide. 
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TNFa-induced apoptosis assays were performed by incubating transiently transfected cells 
in complete medium containing 30 ng/ml of mTNFa (R&D, Minneapolis, MN, USA) for 24 h. 
Apoptosis was scored as described for serum withdrawal-induced apoptosis. The results are shown 
in Figure 6B. As shown in Figure 6B, THAP1 induced apoptosis. 

EXAMPLE 1 1 

The THAP domain is essential for THAP1 pro-apoototic activity 
To determine the role of the amino-terminal THAP domain (amino acids 1 to 89) in the 
functional activity of THAP 1, we generated a THAP1 mutant that is deleted of the THAP domain 
(THAP 1 ATHAP). THAP1ATHAP (amino acids 90-213) was amplified by PCR, using pEGFP.C2- 
THAP1 as template, with primers 2HMR12 (5*- 

GCGGAATTCAAAGAAGATCTTCTGGAGCCACAGGAAC-3 7 ) (SEQ ID NO: 212) and 2HMR9 
(5* -CGCGGATCCTGCTGGT ACTTC AACT ATTTC AA AGT AGTC-3 ' ) (SEQ ID NO: 213), 
digested with EcoRI and BamHI, and cloned in pGBKT7 and pEGFP-C2 vectors, to generate 
pGBKT7-THAPl ATHAP and pEGFP.C2-THAPl ATHAP expression vectors. The role of the 
THAP domain in PML NBs localization, binding to Par4, or pro-apoptotic activity of THAP 1 was 
then analyzed. 

To analyze the subcellular localization of THAP 1 ATHAP, the GFP/ THAP1 ATHAP 
expression construct was transfected into human primary endothelial cells from umbilical vein 
(HUVEC, PromoCell, Heidelberg, Germany). HUVEC were grown in complete ECGM medium 
(PromoCell, Heidelberg, Germany), plated on coverslips and transiently transfected in RPMI 
medium using GeneJammer transfection reagent according to manufacturer instructions 
(Stratagene, La Jolla, CA, USA). Transfected cells were allowed to grow for 48 h on coverslips. 
Cells were then washed twice with PBS, fixed for 15 min at room temperature in PBS containing 
3.7% formaldehyde, and washed again with PBS prior to neutralization with 50mM NH4CI in PBS 
for 5 min at room temperature. Following one more PBS wash, cells were permeabilized 5 min at 
room temperature in PBS containing 0.1% Triton-XlOO, and washed again with PBS. 
Permeabilized cells were then blocked with PBS-BSA (PBS with 1% bovine serum albumin) for 
10' and then incubated 2 hr at room temperature with mouse monoclonal antibody anti- 
PML (mouse IgGl, 1/30, mAb PG-M3 from Dako, Glostrup, Denmark) diluted in PBS-BSA. Cells 
were then washed three times 5 min at room temperature in PBS-BSA, and incubated for 1 hr with 
Cy3 (red fluorescence>conjugated goat anti-mouse IgG (1/1000, Amersham Pharmacia Biotech) 
secondary antibodies, diluted in PBS-BSA. After extensive washing in PBS, samples were air dried 
and mounted in Mowiol. Images were collected on a Leica confocal laser scanning microscope. 
The GFP (green) and Cy3 (red) fluorescence signals were recorded sequentially for identical image 
fields to avoid cross-talk between the channels. 
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This analysis revealed that GFP- THAP1 ATHAP staining exhibits a complete overlap with 
the staining pattern obtained with antibodies directed against PML, indicating the THAP domain is 
not required for THAP1 localization to PML NBs. 

To examine the role of the THAP domain in binding to Par4, we performed in vitro GST 
pull down assays. Par4DD, expressed as a GST-tagged fusion protein and immobilized on 
glutathione sepharose, was incubated with radiolabeled in vitro translated THAP 1 ATHAP. In vitro- 
translated THAP 1 ATHAP was generated with the TNT-coupled reticulocyte lysate system 
(Promega, Madison, WI, USA) using pGBKT7-THAPl ATHAP vector as template. 25 jil of 35 S- 
labelled THAP 1 )ATHAP was incubated with immobilized GST-Par4 or GST proteins overnight at 
4 °C, in the following binding buffer : 10 mM NaP04 pH 8.0, 140 mM NaCl, 3 mM MgC12, ImM 
dithiothreitol (DTT), 0.05% NP40, and 0.2 mM phenylmethyl sulphonyl fluoride (PMSF), 1 mM 
Na Vanadate, 50mM p Glycerophosphate, 25 |ig/ml chimotrypsine, 5 (ig/ml aprotinin, 10 |ng/ml 
Leupeptin. Beads were then washed 5 times in 1 ml binding buffer. Bound proteins were eluted 
with 2X Laemmli SDS-PAGE sample buffer, fractionated by 10% SDS-PAGE and visualized by 
fluorography using Amplify (Amersham Pharmacia Biotech). 

This analysis revealed that THAP1ATHAP interacts with GST/Par4DD, indicating that the 
THAP domain is not involved in THAP1/Par4 interaction (Figure 7A). 

To examine the role of the THAP domain in THAP1 pro-apoptotic activity, we performed 
cell death assays in mouse 3T3 cells. Mouse 3T3-TO fibroblasts were seeded on coverslips in 12- 
well plates at 40 to 50% confluency and transiently transfected with GFP-APSK1, GFP-THAP1 or 
GFP-THAP 1 ATHAP fusion proteins expression vectors using Lipofectamine Plus reagent (Life 
Technologies) according to supplier's instructions. After 6h at 37°C, the DNA-lipid mixture was 
removed and the cells were allowed to recover in complete medium for 24 h. Serum starvation of 
transiently transfected cells was induced by changing the medium to 0% serum, and the amount of 
GFP-positive apoptotic cells was assessed 24 h after induction of serum starvation. Cells were 
fixed in PBS containing 3.7% formaldehyde and permeabilized with 0.1% Triton-XlOO as 
described under immunofluorescence, and apoptosis was scored by in situ TUNEL (terminal 
deoxynucleotidyl transferase-mediated dUTP nick end labeling) and/or DAPI (4,6-Diamidino-2- 
phenylindole) staining of apoptotic nuclei exhibiting nuclear condensation. The TUNEL reaction 
was performed for 1 hr at 37°C using the in situ cell death detection kit, TMR red (Roche 
Diagnostics, Meylan, France). DAPI staining with a final concentration of 0.2 |Xg/ml was performed 
for 10 min at room temperature. At least 100 cells were scored for each experimental point using a 
fluorescence microscope. 

Twenty four hours after serum withdrawal, apoptosis was found in 18% of untransfected 
3T3 cells and in 3T3 cells overexpressing GFP-APSK-1. Levels of serum withdrawal induced 
apoptosis were significantly increased to about 70% in cells overexpressing GFP-THAP 1. Deletion 
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of the THAP domain abrogated most of this effect since serum-withdrawal-induced apoptosis was 
reduced to 28 % in cells overexpressing GFP-THAP 1 ATHAP (Figure 7B). These results indicate 
that the THAP domain, although not required for THAP1 PML-NBs localization and Par4 binding, 
is essential for THAP1 pro-apoptotic activity. 

EXAMPLE 12 

The THAP domain defines a novel family of proteins, the THAP family 
To discover novel human proteins homologous to THAP1 and/or containing THAP 
domains, GenBank non-redundant, human EST and draft human genome databases at the National 
Center for Biotechnology Information (www.ncbi.nlm.nih.gov) were searched with both the 
nucleotide and amino acid sequences of THAP1, using the programs BLASTN, TBLASTN and 
BLASTP (Altschul, S. F., Gish, W., Miller, W., Myers, E. W.and Lipman, D. J. (1990). Basic local 
alignment search tool. J Mol Biol 215: 403-410). This initial step enabled us to identify 12, distinct 
human THAP-containing, proteins (hTHAPO to hTHAPll; Figure 8). In the case of the partial 
length sequences, assembly of overlapping ESTs together with GENESCAN (Burge, C.and Karlin, 
S. (1997). Prediction of complete gene structures in human genomic DNA. J Mol Biol 268: 78-94) 
and GENEWISE (Jareborg, N., Bimey, E.and Durbin, R. (1999). Comparative analysis of 
noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res 9: 815-824) gene 
predictions on the corresponding genomic DNA clones, was used to define the full length human 
THAP proteins as well as their corresponding cDNAs and genes. CLUSTALW (Higgins, D. G„ 
Thompson, J. D.and Gibson, T. J. (1996). Using CLUSTAL for multiple sequence alignments. 
Methods Enzymol 266: 383-402) was used to carry out the alignment of the 12 human THAP 
domains with the DNA binding domain of Drosophila P-element transposase (Lee, C. C, Beall, E. 
L., and Rio, D. C. (1998) Embo J. 17:4166-74), which was colored using the computer program 
Boxshade (www.ch.embnet.org/software/BOX_form.html) (see Figures 9A and 9B). Equivalent 
approach to the one described above was used in order to identify the mouse, rat, pig, and various 
other orthologs of the human THAP proteins (Figure 9C). Altogether, the in silico and 
experimental approaches led to the discovery of 12 distinct human members (hTHAPO to 
hTHAPl 1) of the THAP family of pro-apoptotic factors (Figure 8). 

EXAMPLE 13 
THAP2 and THAP3 interact with Par-4 
To assess whether THAP2 and THAP3 are able to interact with Par-4, yeast two hybrid 
assays using Par-4 wild type bait (Figure 10B) and in vitro GST pull down assays (Figure IOC), 
were performed as described above (Examples 4 and 5). As shown in Figures 10B and IOC, 
THAP2 and THAP3 are able to interact with Par-4. A sequence alignment showing the comparison 
of the THAP domain and the PAR4-binding domain between THAP1, THAP2 and THAP3 is 
shown in Figure 10A. 
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EXAMPLE 14 
THAP2 and THAP3 are able to induce apoptosis 
Serum-induced or TNFct apoptosis analyses were performed as described above (Example 
10) in cells transfected with GFP-APSK1, GFP-THAP2 or GFP-THAP3 expression vectors. 
Apoptosis was quantified by DAPI staining of apoptotic nuclei 24 hours after serum withdrawal or 
addition of TNFct. The results are shown in Figure 11A (serum withdrawal) and Figure 11B 
(TNFa). These results indicate that, THAP-2 and THAP3 induce apoptosis. 

EXAMPLE 15 

Identification of the S LC/CCL21 chemokine-binding domain of human THAP1 

To identify the SLC/CCL21 chemokine-binding domain of human THAP1, a series of 
THAP1 deletion constructs was generated as described in Example 7. 

Two-hybrid interaction between THAP1 mutants and chemokine SLC/CCL21 was tested 
by cotransformation of AH109 with pGADT7-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3 and 
pGBKT7-SLC/CCL21 and selection of transformants by His and Ade double auxotrophy according 
to manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). pGBKT7- 
SLC/CCL21 vector was generated by subcloning the BamHI SLC/CCL21 fragment from pGBT9- 
SLC (see example 1) into the unique BamHI cloning site of vector pGBKT7 (Clontech). Positive 
two-hybrid interaction with chemokine SLC/CCL21 was observed with mutants THAP1-C1, -C2, - 
C3, but not with mutants THAP1-N1, -N2 and -N3, suggesting that the SLC/CCL21 chemokine- 
binding domain of human THAP1 is found between THAP1 residues 143 and 213 (Figure 12). 

EXAMPLE 16 
In vitro T HAP1 /chemokine SLC-CCL21 interaction assay 

To confirm the interaction observed in yeast two-hybrid system, we performed in vitro GST 
pull down assays. THAP1, expressed as a GST-tagged fusion protein and immobilized on 
glutathione sepharose, was incubated with radiolabeled in vitro translated SLC/CCL21. 

To generate the GST-THAP1 expression vector, the full-length coding region of THAP1 
(amino acids 1-213) was amplified by PCR from HEVEC cDNA with primers 2HMR8 (5'- 
CGCGGATCCGTGCAGTCCTGCTCCGCCTACGGC-3') (SEQ ID NO: 214) and 2HMR11 (5'- 
CCGAATTCTTATGCTGGTACTTCAACTATTTCAAAGTAG-3') (SEQ ID NO: 215), digested 
with BamHI and EcoRI, and cloned in frame downstream of the Glutathion S-Transferase ORF, 
between the BamHI and EcoRI sites of the pGEX-2T prokaryotic expression vector (Amersham 
Pharmacia Biotech, Saclay, France). GST-THAP1 fusion protein encoded by plasmid pGEX-2T- 
THAP1 and control GST protein encoded by plasmid pGEX-2T, were then expressed in KColi 
DH5ct and purified by affinity chromatography with glutathione sepharose according to supplier's 
instructions (Amersham Pharmacia Biotech). The yield of proteins used in GST pull-down assays 
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was determined by SDS-Polyarylamide Gel Electrophoresis (PAGE) and Coomassie blue staining 
analysis. 

In v/fro-translated SLC/CCL2 1 was generated with the TNT-coupled reticulocyte lysate 
system (Promega, Madison, WI, USA) using as template pGBKT7-SLC/CCL2 1 vector (see 
Example 15). 25 ul of 35 S-labelled wild-type SLC/CCL21 was incubated with immobilized GST- 
THAP1 or GST proteins overnight at 4 °C, in the following binding buffer : 10 mMNaP04 pH 8.0, 
140 mM NaCl, 3 mM MgC12, ImM dithiothreitol (DTT), 0.05% NP40, and 0.2 mM phenylmethyl 
sulphonyl fluoride (PMSF), 1 mM Na Vanadate, 50mM p Glycerophosphate, 25 |xg/ml 
chimotrypsine, 5 |ig/ml aprotinin, 10 ug/ml Leupeptin. Beads were then washed 5 times in 1 ml 
binding buffer. Bound proteins were eluted with 2X Laemmli SDS-PAGE sample buffer, 
fractionated by 10% SDS-PAGE and visualized by fluorography using Amplify (Amersham 
Pharmacia Biotech). As expected, GST/THAP1 interacted with SLC/CCL21 (Figure 13). In 
contrast, SLC/CCL21 failed to interact with GST beads. 

EXAMPLE 17 

Identification of the THAP1 -binding domain of human chem okine SLC/CCL21 
To determine the THAPl-binding site on human chemokine SLC/CCL21, a SLC/CCL2 1 
deletion mutant (SLC/CCL2 1 ACOOH) lacking the SLC-specific basic carboxy-terminal extension 
(amino acids 102-134; GenBank Accession Number NP_002980) was generated. This 
SLC/CCL2 1 ACOOH mutant, which retains the CCR7 chemokine receptor binding domain of 
SLC/CCL21 (amino acids 24-101), was used both in yeast two-hybrid assays with THAPl bait and 
in in vitro GST-pull down assays with GST-THAP1. 

For two-hybrid assays, yeast cells were cotransformed with BD7-THAP1 and AD7- 
SLC/CCL21 or AD7-SLC/CCL21 ACOOH expression vectors. AD7-SLC/CCL2 1 or AD7- 
SLC/CCL21 ACOOH expression vectors were generated by subcloning BamHI fragment (encoding 
SLC amino acids 24-134) or BamHI-PstI fragment (encoding SLC amino acids 24-102) from 
pGKT7-SLC/CCL21 (see example 15) into pGADT7 expression vector (Clontech). Transformants 
were selected on media lacking histidine and adenine. Figure 13 shows that both the SLC/CCL21 
wild type and the SLC/CCL21 ACOOH deletion mutants could bind to THAP1. Identical results 
were obtained by cotransformation of AD7-THAP1 with BD7-SLC/CCL2 1 or BD7- 
SLC/CCL21 ACOOH. 
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GST pull down assays, using in v/fro-translated SLC/CCL2 1 ACOOH, generated with the 
TNT-coupled reticulocyte lysate system (Promega, Madison, WI, USA) using as template 
pGBKT7-SLC/CCL2 1 ACOOH, were performed as described in Example 16. Figure 13 shows that 
both the SLC/CCL21 wild type and the SLC/CCL21 ACOOH deletion mutants could bind to 
THAP1. 

EXAMPLE 18 
Preparation of THAPl/Fc Fusion Proteins 
This example describes preparation of a fusion protein comprising THAP1 or the SLC/CCL2 1 
chemokine-binding domain of THAP1 fused to an Fc region polypeptide derived from an antibody. 
An expression vector encoding the THAPl/Fc fusion protein is constructed as follows. 
Briefly, the full length coding region of human THAP1 (SEQ ID NO: 3; amino acids -1 to 
213) or the SLC/CCL21 chemokine-binding domain of human THAP1 (SEQ ID NO: 3; amino 
acids -143 to 213) is amplified by PCR. The oligonucleotides employed as 5' primers in the PCR 
contain an additional sequence that adds a Not I restriction site upstream. The 3' primer includes an 
additional sequence that encodes the first two amino acids of an Fc polypeptide, and a sequence that 
adds a Bgl II restriction site downstream of the THAP1 and Fc sequences. 
A recombinant vector containing the human THAP1 cDNA is employed as the template in the PCR, 
which is conducted according to conventional procedures. The amplified DNA is then digested with 
Not I and Bgl II, and the desired fragments are purified by electrophoresis on an agarose gel. 

A DNA fragment encoding the Fc region of a human IgGl antibody is isolated by digesting 
a vector containing cloned Fc-encoding DNA with Bgl II and Not I. Bgl II cleaves at a unique Bgl 
II site introduced near the 5* end of the Fc-encoding sequence, such that the Bgl II site encompasses 
the codons for amino acids three and four of the Fc polypeptide. Not I cleaves downstream of the 
Fc-encoding sequence. The nucleotide sequence of cDNA encoding the Fc polypeptide, along with 
the encoded amino acid sequence, can be found in International Publication No: WO93/10151. 

In a three-way ligation, the above-described THAP1 (or SLC/CCL21 chemokine-binding 
domain of THAP1) -encoding DNA and Fc-encoding DNA are inserted into an expression vector 
that has been digested with Not I and treated with a phosphatase to minimize recircularization of 
any vector DNA without an insert. An example of a vector which can be used is pDC406 
(described in McMahan et al., EMBO J. 10:2821, 1991), which is a mammalian expression vector 
that is also capable of replication in E. coli. 

E. coli cells are then transfected with the ligation mixture, and the desired recombinant 
vectors are isolated. The vectors encode amino acids- 1 to 213 of the THAP1 sequence (SEQ O 
NO: 3) or amino acids-143 to 213 of the THAP1 sequence of (SEQ ID NO: 3), fused to the 
terminus of the Fc polypeptide. The encoded Fc polypeptide extends from the N-terminal hinge 
region to the native C-terminus, i.e., is an essentially full-length antibody Fc region. 
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CV-l/EBNA-1 cells are then transfected with the desired recombinant isolated from E. coli, 
CV-l/EBNA-1 cells (ATCC CRL 10478) can be transfected with the recombinant vectors by 
conventional procedures. The CVI-EBNA-1 cell line was derived from the African Green Monkey 
kidney cell line CV-1 (ATCC CCL 70), as described by McMahan et al. (1991). EMBOJ. 10:2821. 
The transfected cells are cultured to allow transient expression of the THAPl/Fc or SLC/CCL21 
chemokine-binding domain of THAPl/Fc fusion proteins, which are secreted into the culture 
medium. The secreted proteins contain the mature form of THAP1 or the SLC/CCL21 chemokine- 
binding domain of THAP1, fused to the Fc polypeptide. The THAPl/Fc and SLC/CCL21 
chemokine-binding domain of THAPl/Fc fusion proteins are believed to form dimers, wherein two 
such fusion proteins are joined by disulfide bonds that form between the Fc moieties thereof The 
THAPl/Fc and SLC/CCL21 chemokine-binding domain of THAPl/Fc fusion proteins can be 
recovered from the culture medium by affinity chromatography on a Protein A-bearing 
chromatography column. 

EXAMPLE 19 
The THAP domain defines a family of nuclear factors 

To determine the subcellular localization of the different human THAP proteins, a series of 
GFP-THAP expression constructs were transfected into primary human endothelial cells. In 
agreement with the possible functions of THAP proteins as DNA-binding factors, we found that all 
the human THAP proteins analyzed (THAP0, 1, 2, 3, 6, 7, 8, 10, 11) localize preferentially to the 
cell nucleus (Figure 14). In addition to their diffuse nuclear localization, some of the THAP 
proteins also exhibited association with distinct subnuclear structures: the nucleolus for THAP2 and 
THAP3, and punctuate nuclear bodies for THAP7, THAP8 and THAP11. Indirect 
immunofluorescence microscopy with anti-PML antibodies revealed that the THAP8 and THAP1 1 
nuclear bodies colocalize with PML-NBs. Although the THAP7 nuclear bodies often appeared in 
close association with the PML-NBs, they never colocalized. 

Analysis of the subcellular localization of the GFP-THAP fusion proteins was performed as 
described above (Example 3). The GFP-THAP constructs were generated as follows: the human 
THAP0 coding region was amplified by PCR from Hevec cDNA with primers THAP0-1 (5'- 
GCCGAATTCATGCCGAACTTCTGCGCTGCCCCC-3') (SEQ ID NO: 216) and THAPO-2 (5'- 
CGCGGATCCTTAGGTTATTTTCCACAGTTTCGGAATTATC-3') (SEQ ID NO: 217), digested 
with EcoRI and BamHI, and cloned in the same sites of the pEGFP-C2 vector, to generate 
pEGFPC2-THAP0 ; the coding region of human THAP2, 3, 7, 6 and 8 were amplified by PCR 
respectively from Image clone No: 3606376 with primers THAP 2-1 (5'- 
GCGCTGC AGC AAGCT AAATTT AAATG AAGGT ACTCTTGG-3 ' ) (SEQ ID NO: 218) and 
THAP2-2 (5 ' -GCGAGATCTGGGAAATGCCGACC AATTGCGCTGCG-3 ') (SEQ ID NO: 219) 
digested with Bglll and PstI, from Image clone No: 4813302 and No: 3633743 with primers 
THAP3-1 (5>-AGAGGATCCTTAGCTCTGCTGCTCTGGCCCAAGTC-3') (SEQ ID NO: 220) 
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THAP3-2 (5'-AGAGAATTCATGCCGAAGTCGTGCGCGGCCCG-3') (SEQ ID NO: 221) and 
primers THAP7-1 (5'-GCGGAATTCATGCCGCGTCACTGCTCCGCCGC-3') (SEQ ID NO: 222) 
THAP7-2 (5'-GCGGGATCCTCAGGCCATGCTGCTGCTCAGCTGC-3') (SEQ ID NO: 223), 
digested with EcoRI and BamHI, from Image clone No: 757753 with primers THAP6-1 (5'- 
GCGAGATCTCGATGGTGAAATGCTGCTCCGCCATTGGA-3') (SEQ ID NO: 224) and 
THAP6-2 (5 '-GCGGGATCCTC ATGAAATATAGTCCTGTTCTATGCTCTC-3 ') (SEQ ID NO: 
225) digested with Bglll and BamHI, and from Image clone No: 4819178 with primers THAP8-1 
(5'-GCGAGATCTCGATGCCCAAGTACTGCAGGGCGCCG-3') (SEQ ID NO: 226) and 
THAP8-2 (5'-GCGGAATTCTTATGCACTGGGGATCCGAGTGTCCAGG-3') (SEQ ID NO: 
227), digested with Bglll and EcoRI and cloned in frame downstream of the Enhanced Green 
Fluorescent Protein (EGFP) ORE in pEGFPC2 vector (Clontech) digested with the same enzymes 
to generate pEGFPC2-THAP2, -THAP3, -THAP7, -THAP6 and -THAP8 ; the human THAP10 and 
THAP11 coding region were amplified by PCR from Hela cDNA respectively with primers 
THAP10-1 (5'-GCGGAATTCATGCCGGCCCGTTGTGTGGCCGC-3') (SEQ ID NO: 228) 
THAP10-2 (5'-GCGGGATCCTTAACATGTTTCTTCTTTCACCTGTACAGC-3') (SEQ ID NO: 
229) digested with EcoRI and BamHI, and with primers THAP11-1 (5'- 
GCGAGATCTCGATGCCTGGCTTTACGTGCTGCGTGC-3') (SEQ ID NO: 230) and THAP11-2 
(5'-GCGGAATTCTCACATTCCGTGCTTCTTGCGGATGAC-3') (SEQ ID NO: 231), digested 
with Bgin and EcoRI, cloned in the same sites of the pEGFP-C2 vector, to generate pEGFPC2- 
THAP10 and -THAP 1 1 . 

EXAMPLE 20 
The THA P domain shares structural similarities with the 
DNA-bindi ng domain of nuclear hormone receptors 
In an effort to model the three-dimensional structure of the THAP domain, we searched the 
PDB crystallographic database. As sequence homology detection is more sensitive and selective 
when aided by secondary structure information, structural homologs of the THAP domain of human 
THAP1 were searched using the SeqFold threading program (Olszewski et al. (1999) Theor. Chem. 
Acc. 101, 57-61) which combines sequence and secondary structure alignment. The 
crystallographic structure of the thyroid hormone receptor p* DBD (PDB code: 2NLL) gave the best 
score of the search and we used the resulting structural alignment, displayed in Figure 15A to 
derive a homology-based model of the THAP domain from human THAP1 (Figure 15B). Note that 
the distribution of Cys residues in the THAP domain does not fully match that of the thyroid 
hormone receptor p DBD (Figure 15A) and hence cannot allow the formation of the two 
characteristic . 'C4-type' Zn-fingers (red color-coding in Figure 15 A). However, a network of 
stacking interactions between aromatic/hydrophobic residues or aliphatic parts of lysine side-chains 
ensures the stability of the structure of the THAP domain (cyan color-coding in Figures 15A and 
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15B). Interestingly the same threading method applied independently to the Drosophila P-element 
transposase DBD identified the crystallographic structure of the glucocorticoid receptor DBD (PDB 
code: 1GLU) as giving the best score. In the same way, we used the resulting structural alignment, 
displayed in Figure 15D, to build a model of the transposase DBD (Figure 15C). Note the presence 
of an hydrophobic core equivalent to that of the THAP domain (cyan color-coding in Figures 15C 
and 15D). All the DNA-binding domains of the nuclear receptors fold into a typical pattern which is 
mainly based on two interacting ot-helices, the first one inserting into the target DNA major groove. 
Our threading and modeling results indicate that the THAP domain and the D. melcmogaster P- 
element transposase DBD likely share a common topology which is similar to that of the DBD of 
nuclear receptors. 

Molecular modeling was performed using the Insightll, SeqFold, Homology and Discover 
modules from the Accelrys (San Diego, CA) molecular modeling software (version 98), run on a 
Silicon Graphics 02 workstation. Optimal secondary structure prediction of the query protein 
domains was ensured by the DSC method within SeqFold. The threading-derived secondary 
structure alignments was used as input for homology-modeling, which was performed according to 
a previously described protocol (Manival et al. (2001) Nucleic Acids Res 29 :2223-2233). The 
validity of the models was checked both by Ramachandran analysis and folding consistency 
verification as previously reported (Manival et al. (2001) Nucleic Acids Res 29 :2223-2233). 

EXAMPLE 21 
Homodimerization domain of human THAP1 

To identify the sequences mediating homodimerization of THAP1, a series of THAP1 
deletion constructs was generated as described in Example 7. 

Two-hybrid interaction between THAP1 mutants and THAP1 wild type was tested by 
cotransformation of AH109 with pGADT7-THAPl-Cl, -C2, -C3, -Nl, -N2 or -N3 and pGBKT7- 
THAP1 wild-type and selection of transformants by His and Ade double auxotrophy according to 
manufacturer's instructions (MATCHMAKER two-hybrid system 3, Clontech). Positive two-hybrid 
interaction with THAP1 wild type was observed with mutants THAP 1 -CI, -C2, -C3, -and -N3 bat 
not with mutants THAP1-N1 and -N2, suggesting the THAP1 homodimerization domain is found 
between THAP1 residues 143 and 192 (Figure 16A). 

To confirm the results obtained in yeast, THAP1 mutants were also tested in in vitro GST 
pull down assays. Wild type THAP1 expressed as a GST-tagged fusion protein and immobilized on 
glutathione sepharose (as described in example 16), was incubated with radiolabeled in vitro 
translated THAP1 mutants. In v/rra-translated THAP1 mutants were generated with the TNT- 
coupled reticulocyte lysate system (Promega, Madison, WI, USA) using pGADT7-THAPl-Cl, -C2, 
-C3, -Nl, -N2 or -N3 vector as template. 25 pi of each 35 S-labelled THAP1 mutant was incubated 
with immobilized GST or GST-THAP1 wild-type protein overnight at 4 °C, in the following 
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binding buffer : 10 mM NaP04 pH 8.0, 140 mM NaCl, 3 mM MgC12, ImM dithiothreitol (DTT), 
0.05% NP40, and 0.2 mM phenylmethyl sulphonyl fluoride (PMSF), 1 mM Na Vanadate, 50mM p 
Glycerophosphate, 25 jj.g/ml chimotrypsine, 5 ug/ml aprotinin, 10 ug/ml Leupeptin. Beads were 
then washed 5 times in 1 ml binding buffer. Bound proteins were eluted with 2X Laemmli SDS- 
PAGE sample buffer, fractionated by 10% SDS-PAGE and visualized by fluorography using 
Amplify (Amersham Pharmacia Biotech). As expected, THAP1-C1, -C2, -C3, -and -N3 interacted 
with GST/THAP1 (Figure 16B). In contrast, THAP1-N1 and -N2 failed to interact with 
GST/THAP1 beads. Therefore, essentially identical results were obtained with the two 
THAP1/THAP1 interactions assays: the THAP1 homodimerization domain of THAP1 is found 
between residues 143 and 192 of human THAP1. 

EXAMPLE 22 
Alternatively spliced isoform of human THAP1 
The two distinct THAP1 cDNAs, THAPla and THAPlb have been discovered (Figure 
17A). These splice variants, were amplified by PCR from HEVEC cDNA with primers 2HMR10 
(5 '-CCGAATTCAGGATGGTGC AGTCCTGCTCCGCCT-3 ') (SEQ ID NO: 232) and 2HMR9 (5'- 
CGCGGATCCTGCTGGTACTTCAACTATTTCAAAGTAGTC-3') (SEQ ID NO: 233), digested 
with EcoRI and BamHI, and cloned in frame upstream of the Enhanced Green Fluorescent Protein 
(EGFP) ORF in pEGFP.N3 vector (Clontech) to generate pEGFP.N3-THAPla and pEGFP- 
THAPlb. DNA sequencing revealed that THAPlb cDNA isoform lacks exon 2 (nucleotides 273- 
468) of the human THAP1 gene (Figure 17B). This alternatively spliced isoform of human THAP1 
(~ 2 kb mRNA) was also observed in many other tissues by Northern blot analysis (see Figure 2). 
The THAPla/GFP and THAPlb/GFP expression constructs were then transfected into COS 7 cells 
(ATCC) and expression of the fusion proteins was analyzed by western blotting with anti-GFP 
antibodies. The results are shown in Figure 17C which demonstrates that the second isoform of 
human THAP1 (THAPlb) encodes a truncated THAP1 protein (THAP1 C3) lacking a substantial 
portion of the amino terminus (amino acids 1-142 of SEQ ID NO: 3). 

EXAMPLE 23 

High throughput screening assay for modulators of THAP family 
Polypeptide pro-apoptotic activity 
A high throughput screening assay for molecules that abrogate or stimulate THAP-family 
polypeptide proapoptotic activity was developed, based on serum-withdrawal induced apoptosis in 
a 3T3 cell line with tetracycline-regulated expression of a THAP family polypeptide. 

In a preferred example, the THAPl cDNA with an in-frame myc tag sequence, was 
amplified by PCR using pGBKT7-THAP 1 as a template with primers myc.BD7 (5*- 
GCGCTCTAGAGCC ATC ATGGAGGAGC AGAAGCTGATC-3 ' ) (SEQ ID NO: 234) and 
2HMR15 (5'-GCGCTCTAGATTATGCTGGTACTTCAACTATTTCAAAGTAG-3') (SEQ ID 
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NO: 235), and cloned downstream of a tetracycline regulated promoter in plasmid vector pTRE 
(Clontech, Palo Alto, CA), using Xba I restriction site, to generate plasmid pTRE-mycTHAPl . To 
establish 3T3-TO-mycTHAPl stable cell lines, mouse 3T3-TO fibroblasts (Clontech) were seeded 
at 40 to 50% confluency and co-transfected with the pREP4 plasmid (Invitrogen), which contains a 
hygromycin B resistance gene, and the mycTHAPl expression vector (pTRE-my cTHAP 1 ) at 1:10 
ratio, using Lipofectamine Plus reagent (Life Technologies) according to supplier's instructions. 
Transfected cells were selected in medium containing hygromycin B (250 U/ml; Calbiochem) and 
tetracycline (2 ug/ml; Sigma). Several resistant colonies were picked and analyzed for the 
expression of mycTHAPl by indirect immunofluorescence using anti-myc epitope monoclonal 
antibody (mouse IgGl, 1/200, Clontech). A stable 3T3-TO cell line expressing mycTHAPl (3T3- 
TO-mycTH AP 1 ) was selected and grown in Dulbecco's Modified Eagle's Medium supplemented 
with 10% Fetal Calf Serum, 1% Penicillin-streptomycin (all from Life Technologies, Grand Island, 
NY, USA) and tetracycline (2 ug/ml; Sigma). Induction of THAP1 expression into this 3T3-TO- 
mycTHAPl cell line was obtained 48 h after removal of tetracycline in the complete medium. 

A drug screening assay using the 3T3-TO-mycTHAPl cell line can be carried out as 
follows. 3T3-TO-mycTHAPl cells are plated in 96- or 3 84- wells microplates and THAP1 
expression is induced by removal of tetracycline in the complete medium. 48 h later, the apoptotic 
response to serum withdrawal is assayed in the presence of a test compound, allowing the 
identification of test compounds that either enhance or inhibit the ability of THAP1 polypeptide to 
induce apoptosis. Serum starvation of 3T3-TO-mycTHAPl cells is induced by changing the 
medium to 0% serum, and the amount of cells with apoptotic nuclei is assessed 24 h after induction 
of serum starvation by TUNEL labeling in 96- or 3 84- wells microplates. Cells are fixed in PBS 
containing 3.7% formaldehyde and permeabilized with 0.1% Triton-XlOO, and apoptosis is scored 
by in situ TUNEL (terminal deoxynucleotidyl transferase-mediated dUTP nick end labeling) 
staining of apoptotic nuclei for 1 hr at 37°C using the in situ cell death detection kit, TMR red 
(Roche Diagnostics, Meylan, France). The intensity of TMR red fluorescence in each well is then 
quantified to identify test compounds that modify the fluorescence signal and thus either enhance or 
inhibit THAP1 pro-apoptotic activity. 

EXAMPLE 24 

High throughput two-hvbrid screening assay for drugs that modulate THAP-family 
polvpeptide/THAP-familv target protein interaction 
To identify drugs that modulate THAP1/Par4 or THAP1/SLC interactions, a two-hybrid 
based high throughput screening assay can be used. 

As described in Example 17, AH109 yeast cells (Clontech) cotransformed with plasmids 
pGBKT7-THAP 1 and pGADT7-Par4 or pGADT7-SLC can be grown in 3 84- well plates in 
selective media lacking histidine and adenine, according to manufacturer's instructions 
(MATCHMAKER two-hybrid system 3, Clontech). 
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Growth of the transformants on media lacking histidine and adenine is absolutely 
dependent on the THAP1/Par4 or THAP1/SLC two-hybrid interaction and drugs that disrupt 
THAP1/Par4 or THAP1/SLC binding will therefore inhibit yeast cell growth. 

Small molecules (5 mg ml" 1 in DMSO; Chembridge) are added by using plastic 384-pin 
arrays (Genetix). The plates are incubated for 4 to 5 days at 30 °C, and small molecules which 
inhibit the growth of yeast cells by disrupting THAP1/Par4 or THAP1/SLC two-hybrid interaction 
are selected for further analysis. 

EXAMPLE 25 

High throughput in vitr o assay to identify inhibitors of THAP-familv Polvneptide/THAP- 

family protein target interaction 

To identify small molecule modulators of THAP function, a high-throughput screen based 
on fluorescence polarization (FP) is used to monitor the displacement of a fluorescently labelled 
THAP1 protein from a recombinant glutathione-S-transferase (GST)-THAP binding domain of 
Par4 (Par4DD) fusion protein or a recombinant GST-SLC/CCL2 1 fusion protein. 

Assays are carried out essentially as in Degterev et al, Nature Cell Biol. 3: 173-182 (2001) 
and Dandliker et al, Methods Enzymol. 74: 3-28 (1981). The assay can be calibrated by titrating a 
THAP1 peptide labelled with Oregon Green with increasing amounts of GST-Par4DD or 
GST-SLC/CCL2 1 proteins. Binding of the peptide is accompanied by an increase in polarization 
(mP, millipolarization). 

THAP 1 and PAR4 polypeptides and GST-fusions can be produced as previously 
described. The THAP1 peptide was expressed and purified using a QIAexpressionist kit (Qiagen) 
according to the manufacturer's instructions. Briefly, the entire THAP1 coding sequence was 
amplified by PCR using pGBKT7-THAP 1 as a template with primers 2HMR8 (5>- 
CGCGGATCCGTGCAGTCCTGCTCCGCCTACGGC-3') (SEQ ID NO: 236) and 2HMR9 (5 5 - 
CGCGGATCCTGCTGGTACTTCAACTATTTCAAAGTAGTC-3') (SEQ ID NO: 237), and 
cloned into the BamHL site of pQE30 vector (Qiagen). The resulting pQE30-HisTHAPl plasmid 
was transformed in Rcoli strain Ml 5 (Qiagen). 6xHis-tagged-THAP 1 protein was purified from 
inclusion bodies on a Ni-Agarose column (Qiagen) under denaturing conditions, and the eluate was 
used for in vitro interaction assays. To produce GST-Par4DD fusion protein, Par4DD (amino acids 
250-342) was amplified by PCR with primers Par4.10 (5'- 
GCCGGATCCGGGTTCCCTAGATATAACAGGGATGCAA-3') (SEQ ID NO: 238) and Par43 
(5'-GCGGGATCCCTCTACCTGGTCAGCTGACCCACAAC-3 , ) (SEQ ID NO: 239), and cloned 
in frame downstream of the Glutathione S-Transferase (GST) ORF, into the BamHI site of the 
pGEX-2T prokaryotic expression vector (Amersham Pharmacia Biotech, Saclay, France)- 
Similarly, to produce GST-SLC/CCL2 1 fusion protein, the mature form of human SLC/CCL21 
(amino acids 24-134) was amplified by PCR with primers hSLCbam.5' (5'- 
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GCGGGATCCAGTGATGGAGGGGCTCAGGACTGTTG-3') (SEQ ED NO: 240) and 
hSLCbam.3' (5 ' -GCGGG ATCCCT ATGGCCCTTT AGGGGTCTGTGACC-3 ') (SEQ ID NO: 
241), digested with BamHI and inserted into the BamHI cloning site of the pGEX-2T vector. GST- 
Par4DD (amino acids 250-342) and GST-SLC/CCL2 1 (amino acids 24-134) fusion proteins were 
expressed in E.Coli DH5oc (supE44, DELTAlacU169 (801acZdeltaM15), hsdR17, recAl, endAl, 
gyrA96, thil, relA 1) and purified by affinity chromatography with glutathione sepharose according 
to supplier's instructions (Amersham Pharmacia Biotech). 

For screening small molecules, THAP1 peptide is labelled with succinimidyl Oregon Green 
(Molecular Probes, Oregon) and purified by HPLC. 33 nM labelled THAP1 peptide, 2jjM GST- 
Par4DD or GST-SLC/CCL2 1 protein, 0.1% bovine gamma-globullin (Sigma) and 1 mM 
dithiothreitol mixed with PBS, pH 7.2 (Gibco), are added to 384-well black plates (Lab Systems) 
with Multidrop (Lab Systems). Small molecules (5 mg ml** 1 in DMSO; Chembridge) are 
transferred by using plastic 384-pin arrays (Genetix). The plates are incubated for 1-2 hours at 25 
°C, and FP values are determined with an Analyst plate reader (LJLBiosystems). 

EXAMPLE 26 

High throughput chip assay to identify inhibitors of THAP-family polvpeptide/THAP- 

familv protein target interaction 
A chip based binding assay Degterev et al, (2001) Nature Cell Biol. 3: 173-182 using 
unlabelled THAP and THAP-family target protein may be used to identify molecules capable of 
interfering with THAP-family and THAP-family target interactions, providing high sensitivity and 
avoiding potential interference from label moieties. In this example, the THAP1 binding domain of 
Par4 protein (Par4DD) or SLC/CCL21 is covalently attached to a surface-enhanced laser 
desorption/ionization (SELDI) chip, and binding of unlabelled THAP1 protein to immobilized 
protein in the presence of a test compound is monitored by mass spectrometry. 

Recombinant THAP1 protein, GST-Par4DD and GST-SLC/CCL2 1 fusion proteins are 
prepared as described in Example 25. Purified recombinant GST-Par4DD or GST-SLC/CCL2 1 
protein is coupled through its primary amine to SELDI chip surfaces derivatized with 
cabonyldiimidazole (Ciphergen). THAP1 protein is incubated in a total volume of 1 \il for 12 
hours at 4 °C in a humidified chamber to allow binding to each spot of the SELDI chip, then 
washed with alternating high-pH and low-pH buffers (0.1M sodium acetate containing 0.5M NaCl, 
followed by 0.01 M HEPES, pH 7.3). The samples are embedded in an alpha-cyano-4- 
hydroxycinnamic acid matrix and analysed for mass by matrix-assisted laser desorption ionization 
time-of-flight (MALDI-TOF) mass spectrometry. Averages of 100 laser shots at a constant setting 
are collected over 20 spots in each sample. 
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EXAMPLE 27 

High throughout cell assay to identify inhibitors of THAP-familv polvpeptide/THAP- 
family protein target interaction 
A fluorescence resonance energy transfer (FRET) assay is carried out between THAP-1 
and PAR4 or SLC/CCL21 proteins fused with fluorescent proteins. Assays can be carried out as in 
Majhan et al, Nature Biotechnology 16: 547-552 (1998) and Degterev et al, Nature Cell Biol. 3: 
173-182(2001). 

THAP-1 protein is fused to cyan fluorescent protein (CFP) and PAR4 or SLC/CCL2 1 
protein is fused to yellow fluorescent protein (YFP). Vectors containing THAP-family and THAP- 
family target proteins can be constructed essentially as in Majhan et al (1998). A THAP-1 -CFP 
expression vector is generated by subcloning a THAP-1 cDNA into the pECFP-Nl vector 
(Clontech). PAR4-YFP or SLC/CCL2 1 -C YP expression vectors are generated by subcloning a 
PAR4 or a SLC/CCL21 cDNA into the pEYFP-Nl vector (Clontech). 

Vectors are cotransfected to HEK-293 cells and cells are treated with test compounds. 
HEK-293 cells are transfected with THAP-1-CFP and PAJR4-YFP or SLC/CCL2 1-YFP expression 
vectors using Lipofect AMINE Plus (Gibco) or TransLT-1 (PanVera). 24 hours later ceils are 
treated with test compounds and incubated for various time periods, preferably up to 48 hours. 
Cells are harvested in PBS, optionally supplemented with test compound, and fluorescence is 
determined with a C-60 fluorimeter (PTI) or a Wallac plate reader. Fluorescence in the samples 
separately expressing THAP-1-CFP and PAR4-YFP or SLC/CCL2 1 - YFP is added together and 
used to estimate the FRET value in the absence of THAP-1/PAR4 or THAP 1 /SLC/CCL2 1 binding. 

The extent of FRET between CFP and YFP is determined as the ratio between the 
fluorescence at 527nm and that at 475nm after excitation at 433nm. The cotransfection of THAP-1 
protein and PAR4 or SLC/CCL21 protein results in an increase of FRET ratio over a reference 
FRET ratio of 1.0 (determined using samples expressing the proteins separately). A change in the 
FRET ratio upon treatmemt with a test compound (over that observed after cotransfection in the 
absence of a test compound) indicates a compound capable of modulating the interaction of the 
THAP-1 protein and the PAR4 or the SLC/CCL21 protein. 
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EXAMPLE 2S 

In vitro assay to identify THAP-familv polypeptide DNA targets 
DNA binding specificity of THAP1 was determined using a random oligonucleotide 
selection method allowing unbiased analysis of binding sites selected by the THAP domain of the 
THAP1 protein from a random pool of possible sites. The method was carried out essentially as 
described in Bouvet (2001) Methods Mol Biol 148:603-10. Also, see Pollack and Treisman (1990) 
Niic Acid Res. 18:6197-6204; Blackwell and Weintraub, (1990) Science 250: 1104-1110; Ko and 
Engel, (1993) Mol. Cell. Biol 13:4011-4022; Merika and Orkin, (1993) Mol Cell Biol 13: 3999- 
4010; andKrueger andMorimoto, (1994) Mol Cell Biol 14:7592-7603). 
Recombinant THAP domain expression and purification 

A cDNA fragment encoding the THAP domain of human THAP-1 (amino acids 1-90, SEQ 
ID NO: 3) was cloned by PCR using as a template pG ADT7-THAP- 1 (see Example 4) with the 
following primers 5 ? -GCGCATATGGTGCAGTCCTGCTCCGCCTACGGC-3' (SEQ ID NO: 242) 
and 5'-GCGCTCGAGTTTCTTGTCATGTGGCTCAGTACAAAG-3' (SEQ ID NO: 243). The 
PCR product was cloned as a Ndel-Xhol fragment into pET-21c prokaryotic expression vector 
(Novagen) in frame with a sequence encoding a carboxy terminal His tag, to generate pET-21c- 
THAP. 

For the expression of THAP-His6, pET-21c-THAP was transformed into Escherichia coli 
strain BL-21 pLysS. Bacteria were grown at 37°C to an optical density at 600nm of 0.6 and 
expression of the protein was induced by adding isopropyl-P-D-thiogaiactoside (Sigma) at a final 
concentration of lmM and incubation was continued for 4 hours. 

The cells were collected by centrifugation and resuspended in ice cold of buffer A (50 mM 
sodium-phosphate pH 7.5, 300mMNaCl, 0.1% (3-mercaptoethanol, 10 mM Imidazole). Cells were 
lysed by sonication and the lysate was cleared by centrifugation at 12000g for 45 min. The 
supernatant was loaded onto a Ni-NTA agarose column (Quiagen) equilibrated in buffer A. After 
washing with buffer A and Buffer A with 40 mM Imidazole, the protein was eluted with buffer B 
(same as A with 0.05%P-mercaptoethanol and 250 mM Imidazole). 

Fractions containing THAP-His6 were pooled and applied to a Superdex 75 gel filtration 
column equilibrated in Buffer C (Tris-HCl 50mM pH 7.5, 150 mM NaCl, 1 mM DTT). Fractions 
containing the THAP-His6 protein were pooled, concentrated by withn YM-3 Amicon filter devices 
and stored at 4°C or frozen at -80°C in buffer C containing 20% glycerol. The purity of the sample 
was assessed by SDS-Polyarylamide Gel Electrophoresis (PAGE) and Coomassie blue staining 
analysis. The structural integrity of the protein preparation was checked by ESI mass spectrometry 
and. Peptide mass mapping using a MALDI-TOF Mass spectrometer. The protein concentration was 
determined with Bradford Protein Assay. 

Random Oligonucleotide Selection 
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According to the SELEX protocol described in Bouvet (2001) Methods Mol Biol. 148:603- 
10, a 62 bp oligonucleotide having sequences as follows was synthesized: 
5'-TGGGCACTATTTATATCAAC-N25-AATGTCGTTGGTGGCCC-3' (SEQ ID NO: 244) 
where N is any nucleotide, and primers complementary to each end. Primer P is: 5'- 
ACCGC AAGCTTGGGC ACTATTTATATC AAC-3 ' (SEQ ID NO: 245), and primer R is 5'- 
GGTCTAGAGGGCC ACC AACGCATT-3 ' (SEQ ID NO: 246). 

The 62-mer oligonucleotide is made double stranded by PCR using the P and R primers generating 
a 80 bp random pool. 

About 250 ng of THAP-His6 was incubated with Ni-NTA magnetic beads in NT2 buffer 
(20 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.05% NP-40) for 30 min at 4°C on a roller. The beads 
were washed 2 times with 500 pi of NT2 buffer to remove unbound protein. The immobilized 
THAP-His6 was incubated with the random pool of double stranded 80 bp DNA (2 to 5pg) in 100 
pi of Binding buffer (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.05% NP-40, 0.5 mM EDTA, 100 
pg/ml BSA and 20 to 50 pg of poly(dI-dC)) for 10 minutes at room temperature. The beads were 
then washed 6 times with 500 pj of NT2 buffer. The protein/DNA complex were then subjected to 
extraction with phenol/chloroform and precipitation with ethanol using 10 pg of glycogen as a 
carrier. About one fifth of the recovered DNA was then amplified by 15 to 20 cycles of PCR and 
used for the next round of selection. After 8 rounds of selection, the NaCl concentration was 
progressively increased to 150mM. 

After 12 rounds of selection by THAP-His6, pools of amplified oligonucleotides were 
digested with Xba I and Hind III and cloned into pBluescript H KS - (Stratagene) and individual 
clones were sequenced using Big Dye terminator Kit (Applied Biosystem). 

The results of the sequence analysis show that the THAP domain of human THAP1 is a 
site-specific DNA binding domain. Two consensus sequences were deduced from the alignment of 
two sets of nucleotide sequences obtained from the above SELEX procedure (each set containing 9 
nucleic acid sequences). In particular, it was found that the THAP domain recognizes GGGCAA or 
TGGCAA DNA target sequences preferentially organized as direct repeats with 5 nucleotide 
spacing (DR-5 motifs). The consensus sequence being GGGCAAnnnnnTGGCAA (SEQ ID NO: 
149). Additionally, THAP recognizes everted repeats with 11 nucleotide spacing (ER-11 motifs) 
having a consensus sequence of TTGCCAnnnnnrmnnnnGGGCAA (SEQ ID NO: 159). Although 
GGGCAA and TGGCAA sequences constitute the preferential THAP domain DNA binding sites, 
GGGCAT, GGGCAG and TGGCAG sequences are also DNA target sequences recognized by the 
THAP domain. 
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EXAMPLE 29 

Hi ph throughput in vitro assay to id e ntify inhibitors of THAP-family polypeptide or THAP- 
familv interactions with nonspec ific DNA targets 

High throughput assays for the detection and quantification of THAP1 -nonspecific DNA 
binding is carried out using a scintillation proximity assay. Materials are available from Amersham 
(Piscataway, NJ) and assays can be carried out according to Gal S. et al, 6 lh Ann. Conf. Soc. 
Biomol. Screening, 6-9 Sept 2000, Vancouver, B.C.). 

Random double stranded DNA probes are prepared and labelled using [ 3 H]TTP and 
terminal transferase to a suitable specific activity (e.g. approx. 420i/mmol). THAP1 protein or a 
portion thereof is prepared and the quantity of THAP1 protein or a portion thereof is determined via 
ELISA. For assay development purposes, electrophoretic mobility shift assays (EMSA) can be 
carried out to select suitable assay parameters. For the high throughput assay, 3 H labelled DNA, 
anti-THAPl monoclonal antibody and THAP1 in binding buffer (Hepes, pH7.5; EDTA; DTT; 
lOmM ammonium sulfate; KC1 and Tween-20) are combined. The assay is configured in a 
standard 96-well plate and incubated at room temperature for 5 to 30 minutes, followed by the 
addition of 0.5 to 2 mg of PVT protein A SPA beads in 50-100 ul binding buffer. The radioactivity 
bound to the SPA beads is measured using a TopCount™ Microplate Counter (Packard Biosciences, 
Meriden, CT). 

EXAMPLE 30 

High throughput in vitro assav to identify inhibitors of T HAP-familv polypeptide or THAP- 
familv interactions with specific D NA targets 

High throughput assays for the detection and quantification of THAP1 specific DNA 
binding is carried out using a scintillation proximity assay. Materials are available from Amersham 
(Piscataway, NJ) and assays can be carried out according to Gal S. et al, 6 th Ann. Conf. Soc. 
Biomol. Screening, 6-9 Sept 2000, Vancouver, B.C.). 

THAPl-specific double stranded DNA probes corresponding to THAP1 DNA binding 
sequences obtained according to Example 20 are prepared. The probes are labelled using [ 3 H]TTP 
and terminal transferase to a suitable specific activity (e.g. approx. 420i/mmol). THAP1 protein or 
a portion thereof is prepared and the quantity of THAP1 protein or a portion thereof is determined 
via ELISA. For assay development purposes, electrophoretic mobility shift assays (EMSA) can be 
carried out to select suitable assay parameters. For the high throughput assay, 3 H labelled DNA, 
anti-THAPl monoclonal antibody, l^g non-specific DNA (double or single stranded poly-dAdT) 
and THAP1 protein or a portion thereof in binding buffer (Hepes, pH7.5; EDTA; DTT; lOmM 
ammonium sulfate; KC1 and Tween-20) are combined. The assay is configured in a standard 96- 
well plate and incubated at room temperature for 5 to 30 minutes, followed by the addition of 0.5 to 
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2mg of PVT protein A SPA beads in 50-1 OOul" binding buffer. The radioactivity bound to the SPA 
beads is measured using a TopCount™ Microplate Counter (Packard Biosciences, Meriden, CT). 

EXAMPLE 31 
Preparation of antibody compositions 
Substantially pure THAP1 protein or a portion thereof is obtained. The concentration of 
protein in the final preparation is adjusted, for example, by concentration on an Amicon filter 
device, to the level of a few micrograms per ml. Monoclonal or polyclonal antibodies to the protein 
can then be prepared as follows: Monoclonal Antibody Production by Hybridoma Fusion 
Monoclonal antibody to epitopes in the THAP1 protein or a portion thereof can be prepared from 
murine hybridomas according to the classical method of Kohler and Milstein (Nature, 256: 495, 
1975) or derivative methods thereof (see Harlow and Lane, Antibodies A Laboratory Manual, Cold 
Spring Harbor Laboratory, pp. 53-242, 1988). 

Briefly, a mouse is repetitively inoculated with a few micrograms of the THAP1 protein or 
a portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol 
with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on 
selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and 
aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is 
continued. Antibody-producing clones are identified by detection of antibody in the supernatant 
fluid of the wells by immunoassay procedures, such as ELISA as originally described by Engvall, 
E., Meth. Enzymol. 70: 419 (1980). Selected positive clones can be expanded and their monoclonal 
antibody product harvested for use. Detailed procedures for monoclonal antibody production are 
described in Davis, L. et al. Basic Methods in Molecular Biology, Elsevier, New York, 
Section 21-2. 

Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the THAP1 
protein or a portion thereof can be prepared by immunizing suitable non-human animal with the 
THAP1 protein or a portion thereof, which can be unmodified or modified to enhance 
immunogenicity. A suitable nonhuman animal, preferably a non-human mammal, is selected. For 
example, the animal may be a mouse, rat, rabbit, goat, or horse. Alternatively, a crude protein 
preparation which, has been enriched for THAP1 or a portion thereof can be used to generate 
antibodies. Such proteins, fragments or preparations are introduced into the non-human mammal in 
the presence of an appropriate adjuvant (e. g. aluminum hydroxide, RIBI, etc.) which is known in 
the art In addition the protein, fragment or preparation can be pretreated with an agent which will 
increase antigenicity, such agents are known in the art and include, for example, methylated bovine 
serum albumin (mBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, and keyhole 
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limpet hemocyanin (KLH). Serum from the immunized animal is collected, treated and tested 
according to known procedures. If the serum contains polyclonal antibodies to undesired epitopes, 
the polyclonal antibodies can be purified by immunoaffinity chromatography. 

Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 
level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques 
for producing and processing polyclonal antisera are known in the art, see for example, Mayer and 
Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. 
J. Clin. Endocrinol. Metab. 33: 988-991 (1971). Booster injections can be given at regular 
intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, 
for example, by double immunodiffusion in agar against known concentrations of the antigen, 
begins to fall. See, for example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental 
Immunology D. Wier (ed) Blackwell (1973). Plateau concentration of antibody is usually in the 
range of 0.1 to 0.2 mg/ml of serum (about 12: M). Affinity of the antisera for the antigen is 
determined by preparing competitive binding curves, as described, for example, by Fisher, D., 
Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For 
Microbiol., Washington, D. C. (1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal 
protocol are useful in quantitative immunoassays which determine concentrations of antigen- 
bearing substances in biological samples; or they are also used semi-quantitatively or qualitatively 
to identify the presence of antigen in a biological sample. The antibodies may also be used in 
therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein 
in the body. 
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WHAT IS CLAIMED IS : 

1. A method of identifying a test compound that modulates THAP-mediated activities 
comprising: 

contacting a THAP-family polypeptide or a biologically active fragment thereof 
with a test compound, wherein said THAP-family polypeptide comprises an amino acid 
sequence having at least 30% amino acid identity to an amino acid sequence of SEQ ID 
NO: l;and 

determining whether said test compound selectively modulates the activity of said 
THAP-family polypeptide or biologically active fragment thereof, wherein a determination 
that said test compound selectively modulates the activity of said polypeptide indicates that 
said test compound is a candidate modulator of THAP-mediated activities. 

2. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 1, or a biologically active fragment thereof. 

3. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 2, or a biologically active fragment thereof 

4. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 3, or a biologically active fragment thereof. 

5. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 4, or a biologically active fragment thereof. 

6. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 5, or a biologically active fragment thereof 

7. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 6, or a biologically active fragment thereof 

8. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 7, or a biologically active fragment thereof. 

9. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 8, or a biologically active fragment thereof. 

10. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 9, or a biologically active fragment thereof. 

11. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 10, or a biologically active fragment thereof. 

12. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 1 1, or a biologically active fragment thereof. 

13. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ED NO: 12, or a biologically active fragment thereof. 

14. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 13, or a biologically active fragment thereof. 
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15. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence of SEQ ID NO: 14 or a biologically active fragments thereof. 

16. The method of Claim 1, wherein the THAP-family polypeptide comprises the 
amino acid sequence selected from the group consisting of SEQ ID NOs: 15-114 or a biologically 
active fragments thereof. 

17. The method of Claim 1, wherein said THAP-mediated activity is selected from the 
group consisting of interaction with a THAP-family target protein, binding to a nucleic acid, 
binding to PAR-4, binding to SLC, binding to PML, binding to a polypeptide found in PML-NBs, 
localization to PML-NBs, targeting a THAP-family target protein to PML-NBs, and inducing 
apoptosis 

18. The method of Claim 17, wherein said THAP-mediated activity is binding to PAR- 

4. 

19. The method of Claim 17, wherein said THAP-mediated activity is binding to SLC. 

20. The method of Claim 17, wherein said THAP-mediated activity is inducing 
apoptosis. 

2L The method of Claim 17, wherein said nucleic acid comprises a nucleotide 
sequence selected from the group consisting of SEQ ID NOs: 140-159. 

22. The method of Claim 1, wherein said amino acid identity is determined using an 
algorithm selected from the group consisting of XBLAST with the parameters, score=50 and 
wordlength=3, Gapped BLAST with the default parameters of XBLAST, and BLAST with the 
defaul parameters of XBLAST. 

23. An isolated or purified THAP domain polypeptide consisting essentially of an 
amino acid sequence selected from the group consisting of SEQ ID NOs: 1-2, amino acids 1-89 of 
SEQ ID NOs: 3-5, amino acids 1-90 of SEQ ID NOs: 6-9, amino acids 1-92 of SEQ ED NO: 10, 
amino acids 1-90 of SEQ ID NOs: 11-14 and homologs having at least 30% amino acid identity to 
any aforementioned sequence, wherein said polypeptide binds to a nucleic acid. 

24. The isolated or purified THAP domain polypeptide of Claim 23 consisting 
essentially of SEQ ID NO: 1. 

25. The isolated or purified THAP domain polypeptide of Claim 23, wherein said 
amino acid identity is determined using an algorithm selected from the group consisting of 
XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the default 
parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

26. The isolated or purified THAP domain polypeptide of Claim 23, wherein said 
nucleic acid comprises a nucleotide sequence selected from the group consisting of SEQ ID NOs: 
140-159. 

27. An isolated or purified nucleic acid which encodes the THAP domain polypeptide 
of claim 23 or a complement thereof. 
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28. An isolated or purified PAR4-binding domain polypeptide consisting essentially of 
an amino acid sequence selected from the group consisting of amino acids 143-192 of SEQ ID NO: 
3, amino acids 132-181 of SEQ ID NO: 4, amino acids 1 86-234 of SEQ ED NO: 5 , SEQ ID NO: 
15 and homologs having at least 30% amino acid identity to any aforementioned sequence, wherein 
said polypeptide binds to PARA 

29. The isolated or purified PAR4-binding domain of Claim 28 consisting essentially 
of SEQ ID NO: 15. 

30. The isolated or purified PAR4-binding domain of Claim 28 consisting essentially 
of amino acids 143-193 of SEQ ED NO: 3. 

31. The isolated or purified PAR4-binding domain of Claim 28 consisting essentially 
of amino acids 132-181 of SEQ ED NO: 4. 

32. The isolated or purified PAR4-binding domain of Claim 28 consisting essentially 
of amino acids 186-234 of SEQ ED NO: 5. 

33. The isolated or purified PAR4-binding domain polypeptide of Claim 28, wherein 
said amino acid identity is determined using an algorithm selected from the group consisting of 
XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the default 
parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

34. An isolated or purified nucleic acid which encodes the PAR4-binding domain 
polypeptide of claim 28 or a complement thereof. 

35. An isolated or purified SLC-binding domain polypeptide consisting essentially of 
an amino acid sequence selected from the group consisting of amino acids 143-213 of SEQ ID NO: 
3 and homologs thereof having at least 30% amino acid identity, wherein said polypeptide binds to 
SLC. 

36. The isolated or purified SLC-binding domain polypeptide of Claim 35, wherein 
said amino acid identity is determined using an algorithm selected from the group consisting of 
XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the default 
parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

37. An isolated or purified nucleic acid which encodes the SLC-binding domain 
polypeptide of claim 35 or a complement thereof 

38. A fusion protein comprising an Fc region of an immunoglobulin fused to a 
polypeptide comprising an amino acid sequence selected from the group consisting of amino acids 
143-213 of SEQ ID NO: 3 and homologs thereof having at least 30% amino acid identity. 

39. An oligomeric THAP protein comprising a plurality of THAP polypeptides, 
wherein each THAP polypeptide comprises an amino acid sequence selected from the group 
consisting of amino acid 143-213 of SEQ ED NO: 3 and homologs thereof having at least 30% 
amino acid identity. 
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40. A medicament comprising an effective amount of a THAP1 polypeptide or an 
SLC-binding fragment thereof, together with a pharmaceutically acceptable carrier. 

41. An isolated or purified THAP dimerization domain polypeptide consisting 
essentially of an amino acid sequence selected from the group consisting of amino acids 143 and 
192 of SEQ ID NO: 3 and homologs thereof having at least 30% amino acid identity, wherein said 
polypeptide binds to a THAP-family polypeptide.. 

42. The isolated or purified THAP dimerization domain polypeptide of Claim 41, 
wherein said amino acid identity is determined using an algorithm selected from the group 
consisting of XBLAST with the parameters, score=50 and wordlength=3, Gapped BLAST with the 
default parameters of XBLAST, and BLAST with the defaul parameters of XBLAST. 

43. An isolated or purified nucleic acid which encodes the THAP dimerization domain 
polypeptide of claim 41 or a complement thereof. 

44. An expression vector comprising a promoter operably linked to a nucleic acid 
having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 160-175 and 
portions thereof comprising at least 1 8 consecutive nucleotides. 

45. The expression vector of Claim 44, wherein said promoter is a promoter which is 
not operably linked to said nucleic acid selected from the group consisting of SEQ ID NOs.: 160- 
175 in a naturally occurring genome. 

46. A host cell comprising the expression vector of claim 44. 

47. An expression vector comprising a promoter operably linked to a nucleic acid 
encoding a polypeptide comprising an amino acid sequence selected from the group consisting of 
SEQ ID NOs: 1-1 14 and portions thereof comprising at least 18 consecutive nucleotides. 

48. The expression vector of Claim 47, wherein said promoter is a promoter which is 
not operably linked to said nucleic acid selected from the group consisting of SEQ ID NOs.: 160- 
175 in a naturally occurring genome. 

49. A host cell comprising the expression vector of claim 47. 

50. A method of identifying a candidate inhibitor of a THAP-family polypeptide, a 
candidate inhibitor of apoptosis, or a candidate compound for the treatment of a cell proliferative 
disorder, said method comprising: 

contacting a THAP-family polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-1 14 or a fragment comprising a span 
of at least 6 contiguous amino acids of a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-1 14 with a test compound; and 

determining whether said compound selectively binds to said polypeptide, wherein 
a determination that said compound selectively binds to said polypeptide indicates that said 
compound is a candidate inhibitor of a THAP-family polypeptide, a candidate inhibitor of 
apoptosis, or a candidate compound for the treatment of a cell proliferative disorder. 
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51. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-114 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-114, said method comprising: 

contacting said THAP-family polypeptide with a test compound; and 
determining whether said compound selectively inhibits at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively inhibits said at least one biological activity 
of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

52. A method of identifying a candidate inhibitor of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate inhibitor of a THAP-family 
polypeptide of SEQ ID NOs: 1-1 14 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-1 14, said method comprising: 

contacting a cell comprising said THAP-family polypeptide with a test compound; 

and 

determining whether said compound selectively inhibits at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively inhibits said at least one biological activity 
of said polypeptide indicates that said compound is a candidate inhibitor of a THAP-family 
polypeptide, a candidate inhibitor of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

53. A method of identifying a candidate modulator of THAP-family activity, said 
method comprising: 

providing a THAP-family polypeptide of SEQ ID NOs: 1-114 or, a fragment 
comprising a span of at least 6 contiguous amino acids of a polypeptide according to SEQ 
ID NOs: 1-1 14; and 

providing a THAP-family target polypeptide or a fragment thereof; and 
determining whether a test compound selectively modulates the ability of said 
THAP-family polypeptide to bind to said THAP-family target polypeptide, wherein a 
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determination that said test compound selectively modulates the ability of said THAP- 
family polypeptide to bind to said THAP-family target polypeptide indicates that said 
compound is a candidate modulator of THAP-family activity. 

54. The method of Claim 53, wherein said THAP-family polypeptide is provided by a 
first expression vector comprising a nucleic acid encoding a THAP-family polypeptide of SEQ ID 
NOs: 1-1 14 or, a fragment comprising a contiguous span of at least 6 contiguous amino acids of a 
polypeptide according to SEQ ID NOs: 1-1 14, and wherein said THAP-family target polypeptide is 
provided by a second expression vector comprising a nucleic acid encoding a THAP-family target 
polypeptide, or a fragment thereof 

55. The method of Claim 53, wherein said THAP-family activity is apoptosis activity. 

56. The method of Claim 53, wherein said THAP-family target protein is PAR-4. 

57. The method of Claim 53, wherein said THAP-family polypeptide is a THAP-1, 
THAP-2 or THAP-3 protein and said THAP-family target protein is PAR-4. 

58. The method of Claim 53, wherein said THAP-family target protein is SLC. 

59. A method of modulating apoptosis in a cell comprising modulating the activity of a 
THAP-family protein. 

60. The method of Claim 59, wherein said THAP-family protein is selected from the 
group consisting of SEQ ID NOs: 1-114. 

61. The method of Claim 59, wherein modulating the activity of a THAP-family 
protein comprises modulating the interaction of a THAP-family protein and a THAP-family target 
protein. 

62. The method of Claim 59, wherein modulating the activity of a THAP-family 
protein comprises modulating the interaction of a THAP-family protein and a PAR4 protein. 

63. A method of identifying a candidate activator of a THAP-family polypeptide, a 
candidate activator of apoptosis, or a candidate compound for the treatment of a cell proliferative 
disorder, said method comprising: 

contacting a THAP-family polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ED NOs: 1-98 or a fragment comprising a span 
of at least 6 contiguous amino acids of a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 1-98 with a test compound; and 

determining whether said compound selectively binds to said polypeptide, wherein 
a determination that said compound selectively binds to said polypeptide indicates that said 
compound is a candidate activator of a THAP-family polypeptide, a candidate activator of 
apoptosis, or a candidate compound for the treatment of a cell proliferative disorder. 

64. A method of identifying a candidate activator of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate activator of a THAP-family 
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polypeptide of SEQ ID NOs: 1-98 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-98, said method comprising: 

contacting said THAP-family polypeptide with a test compound; and 
determining whether said compound selectively activates at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively activates said at least one biological activity 
of said polypeptide indicates that said compound is a candidate activator of a THAP-family 
polypeptide, a candidate activator of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

65. A method of identifying a candidate activator of apoptosis, a candidate compound 
for the treatment of a cell proliferative disorder, or a candidate activator of a THAP-family 
polypeptide of SEQ ID NOs: 1 to 98 or a fragment comprising a span of at least 6 contiguous amino 
acids of a polypeptide according to SEQ ID NOs: 1-98, said method comprising: 

contacting a cell comprising said THAP-family polypeptide with a test compound; 

and 

determining whether said compound selectively activates at least one biological 
activity selected from the group consisting of interaction with a THAP-family target 
protein, binding to a nucleic acid sequence, binding to PAR-4, binding to SLC, binding to 
PML, binding to PML, binding to a polypeptide found in PML-NBs, localization to PML- 
NBs, targeting a THAP-family target protein to PML-NBs, and inducing apoptosis, wherein 
a determination that said compound selectively activates said at least one biological activity 
of said polypeptide indicates that said compound is a candidate activator of a THAP-family 
polypeptide, a candidate activator of apoptosis, or a candidate compound for the treatment 
of a cell proliferative disorder. 

66. A method of ameliorating a condition associated with the activity of SLC in an 
individual comprising administering a polypeptide comprising the SLC binding domain of a THAP- 
family. protein to said individual. 

67. The method of Claim 66, wherein said polypeptide comprises a fusion protein 
comprising an Fc region of an immunoglobulin fused to a polypeptide comprising an amino acid 
sequence selected from the group consisting of amino acids 143-213 of SEQ ID NO: 3 and 
homologs thereof having at least 30% amino acid identity. 

68. The method of claim 66, wherein said polypeptide comprises an oligomeric THAP 
protein comprising a plurality of THAP polypeptides, wherein each THAP polypeptide comprises 
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an amino acid sequence selected from the group consisting of amino -acid 143-213 of SEQ ID NO: 3 
and homologs thereof having at least 30% amino acid identity. 

69. A method of modulating angiogenesis in an individual comprising modulating the 
activity of a THAP-family protein in said individual. 

70. The method of Claim 69, wherein said THAP-family protein is selected from the 
group consisting of SEQ ID NOs: 1-114. 

71. The method of Claim 69, wherein said modulation is inhibition. 

72. The method of Claim 69, wherein said modulation is induction. 

73. A method of reducing cell death in an individual comprising inhibiting the activity 
of a THAP-family protein in said individual. 

74. The method of Claim 73, wherein said THAP-family protein is selected from the 
group consisting of SEQ ID NOs: 1-114. 

75. The method according to Claim 73, wherein the activity of said THAP-family 
protein is inhibited in the CNS. 

76. A method of reducing inflammation or an inflammatory disorder in an individual 
comprising modulating the activity of a THAP-family protein in said individual. 

77. The method of Claim 76, wherein said THAP-family protein is selected from the 
group consisting of SEQ ID NOs: 1-114. 

78. A method of reducing the extent of cancer in an individual comprising modulating 
the activity of a THAP-family protein in said individual. 

79. The method of Claim 78, wherein said THAP-family protein is selected from the 
group consisting of SEQ ID NOs: 1-114. 

80. The method of Claim 78, wherein increasing the activity of said THAP family 
protein induces apoptosis, inhibits cell division, inhibits metastatic potential, reduces tumor burden, 
increases sensitivity to chemotherapy or radiotherapy, kills a cancer cell, inhibits the growth of a 
cancer cell, kills an endothelial cell, inhibits the growth of an endothelial cell, inhibits angiogenesis, 
or induces tumor regression. 



-186- 



BNSDOCID: <WO 03051917A2J_> 



WO 03/051917 



1/21 



PCT/EP02/14027 



o 

M 

Cm 



Cm CM 

Eh Eh 

Dm W 

m x 

w w 

CO CO 

u o 

M J~) 

CO CO 

CO CO 

>H >H 

^ ^ 

Eh Eh 

CM P-t 

W W 

as ct; 

w w 

IS JS 

w o 

u u 

CO CO 

CM CM 

at; ct; 

EH Eh 

i-q ^ 

CM cu 

Uj 1X4 

ad 

Dm W 

CO CO 

> > 
CM CM 

a q 

Q Q 

ct: ct; 

u 

*c < 

CO CO 

u o 

CO CO 

o o 

> > 



H— } M" 



8 



M" 
Dm 



33 

a q 

> > 
o o 

CO CO 

> < 

CM CM * 

CM CM * 

CM I 

i-q i 

CM CO 

CM CM * 

CM CO 

CM CM * 

h-=l HH* * 

O O * 

WW* 

o o> * 

CM CO 

W W * 

-1 I 

^ h3 * 

q a * 

w w * 

£4 ^ * 

n: :x; * 

CM CM * 
WW* 
Eh 
O 

W 



* 

MM* 

EH Eh * 

CM CM * 

> > 

w w 

M" KM- 

IS IS 

o o 

w w 

Cm* p£ * 

^ ^ * 

W Dm * 

O O * 

Q Q * 



v_q * 
^ * 
WW* 
^ M* * 
O O * 

c£ ct; * 

WW* 

o o> * 

cc ct; * 

pm* ct; * 

u a * 

ct; ct: * 
o o.* 

o o * 

j=C rt! * 

Eh eh * 

« ^ * 

i-3 *-3 * 

^ fc£5 * 

^ ^ * 
ct: ct; * 

HM* ^ * 

^ ^ * 

WW* 

> > * 
o o * 
o o * 

WW* 

I-} k3 * 

o o * 
n: m* 

H H * 

ct; * 

^ ^ * 

c< cm* * 

o o * 

s s 

E-» E-t 

Q Q 

W W 

> > 
Eh Eh 

IS IS 

m re 

Q Q 

o o 

WW* 

> > * 

CO CO * 

-q * 

IS IS * 

> Q 
CM CM * 
H £h * 
O O * 
M" M* * 
CM CM * 
CM CM * 



rfl < * 

CM CM * 

> > * 
WW* 

> > * 

* 
* 
* 



w w 

Dm Dm 



Q 



Q * 
S * 
CM CM * 
h3 hh" * 
MM* 

> > * 

>H >- * 

O CD * 

c< ct; * 

WW* 
CO co * 

> < 

Q Q * 
Q Q * 
fc4 ^ * 
WW* 

^ ct: 

O O * 
W Dm * 
ffi ffi * 

WW* 

^ ^ * 



M 


M 


T— 1 


M 


«— i 


CO 


M 


OO 








vo 


CM 


I — 1 


OO 


r~- 










M 


i-H 


M 


' — i 


M 


M 


i-H 


M 


i — 1 


i-H 


M 


M 


CM 


CM 


CM 


CM 


a* 


CM 


CM 


CM 


rf: 


< 


< 


3 






< 


DC 
















Eh 


Eh 


EH 




Eh 




E-* 




x: 






1 


x: 




x: 






; rl5K-'.. 



Q. 

< 



CO 
CM 



CM 
CO 



CD 



BNSDOCID: <WO. 



03051 91 7A2_I_> 



WO 03/051917 PCT/EP02/14027 

2/21 



S3±AOO>in31 

VlN30Vld 

3NLLS31NI IIVI/VS 
U3AH 

M A3NOI>l 

U 

fo SniAlAHX 




N33HdS 

SniAlAHl 
NO1O0 



310SfllAI -1V±313*S P 
J.UV3H 

Nivua 



a 




BNSDOCID: <WO 03051 91 7A2_l_> 



WO 03/051917 PCT/EP02/14027 

3/21 



Ho 



+ 1 + 




BNSDOCID: <WO 



.03051 91 7A2_I_> 



WO 03/051917 



4/21 



PCT/EP02/U027 




BNSDOC1D: <WO 03051 91 7A2J_> 



WO 03/051917 



5/21 



PCT/EP02/14027 




BNSDOCJD: <WO. 



.03051 91 7A2_L> 



WO 03/051917 



6/21 



PCT/EP02/14027 



i 



o w w 

Q W CO 
Ph < > 
U Q Q 

p£ q a 
< 

www 

PS pS ^ 
O Ol & 

W h tt 




i*; i*; 

<1 i-^ i-q 
O ^ 

> w w 

PS i-q i-q 

woo 
os Pi 

o w w 

ggg 

0 o o 
« PS 

01 01 oi 

I o o 
I «< 

Eh h 
W « 



ps 

h3 



0 ^ ^ 

O PS PS 
W i-q t-q 

&s ^ 

> > 
o o 

Q O O 

www 

^ h3 h3 

> o o 

1 I I 

ft H H 
M I I 

IS 



UJ 

X 

a 



on 

X 

o 

a: 
x 

a 



LU 



CD 

■ 4—* 

"co 

CD 

co 

Z3 
CO 

CD 
CO 

o 
O 



IS 

Q O 



O rt 
I™ Q. 




s:3 



BIMSDOCID: <WO_ 



_03051917A2_I_> 



WO 03/051917 



7/21 



PCT/EP02/14027 




BNSDOCID: <WO 0305 191 7A2J_> 




BNSDOCID: <WO 0305191 7A2_I_> 



WO 03/051917 



9/21 



PCT/EP02/14027 




BNSDOCID: <WO 03051 91 7A2_L> 



WO 03/051917 



10/21 



PCT/EP02/14027 



*1 

e 

! -r 




1^ 







a 


o 




to 
id 
ca 

O co 

CXi p 

co co 

r-H C C 

04 td a) 

en e o 



fa i 

3 r 



• I I I .«J> I 
I CO oC «Z CO I I 

. i a* a* > > i i 

ZcoKZZOOQQtpOco 

I I I l I I I I U I 
t I I • 1 I I f I I O I 




|Q O UX 

« < O 

i t Z <D I l 

I I Z iJ I I 

t co 2 « u: i 

Q D) O (D O 2 



t FwwT 



! 52 • 



. . _ | El Eh »4 X M 

X X OS O X CO Q Q U CO E-» 

' — ~fH — - - 




i i i i t i i 

I I I I t S l 

^11 I I CO I 

a* o* w a, e* as i 

WQO<ft<^HW(0 I X U U 

E-* O ^ ID 1 U H tt4 inj P< > 

- - Km »^ CO X CO *4 «C l 





CO CC 52 

izootaooiHHO* 

! CO Q Qt CO Q U.M r 



*3 

»-4 5*5 
> *S 
CO cu 





PQ 



M Cfl 

I! 



«H O 

ri n in oo <r n o mj> ic h h 



Q, 3 

co «a 

c c 

Eh c 

■§ o 



BNSDOCID: <WO_ 



_03051917A2J_ 



WO 03/051917 



11/21 



PCT/EP02/14027 




iJliiiijllljil!! 



BNSDOCID: <WO 03051 91 7A2_I_> 



WO 03/051917 PCT/EP02/14027 

12/21 





en o» at w4 

2 5 2 £ „, « « * n s *~ w rt rt ° ° °> 
P ^ *€ ^ S 1 5 ?2? ri ciio<ioo to in 



H <7 (D I s CD 



CCC\ 



i" vi- iu 7^ | | 

hihnn 



BNSDOC1D: <WO 03051 91 7A2_I_> 



WO 03/051917 _ 

PCT/EP02/14027 

13/21 



Figure 10 




THAP1 



THAP domain 




NLS 



146 162 



213 



THAP2 KMHWBHWBfiiflgBiB 
1 

THAP3 




Two hybrid bait 
Par4wtor Par4DD 

+ 
+ 
+ 



189 205 239 



Input 10% GST GST-Par4DD 

T1 T2 T3 T1 T2 T3 T1 T2 T3 







-250 






-105 




- * !*V \" .'*.'i! rV^ * 


-75 






-50 






-35 












-30 






-25 



BNSDOCID: <WO _03051917A2_I_> 



WO 03/051917 



14/21 



PCT/EP02/14027 



FIGURE 11 




BNSDOCID: <WO_ 



03051917A2_I_> 



WO 03/051917 PCT/EP02/14027 

15/21 



Figure 12 



THAP domain PRO NLS 



166 



THAP1 wt | " - , «, \ , | 

J 90 110 146 iez 

THAP1-N1 I'v^^,-:^.::-^;,,,.,:^-^! 

1 90 

thapi-n2 \mmmmmmm : 
i 

th api -N3 I 

1 

THAP1-C1 
THAP1-C2 
THAP1-C3 



90 



166 



120 



143 



213 



Two hybrid bait 
chemokine SLC/CCL21 
+ 



192 



213 



□ 

213 

□ 

213 



THAP1 

A(QRCRR) 1 



90 110 



A(QRCRR) 
■ ■ 



thapi lmm§Mm$mm 

RR/AA 1 90 110 



168 172 213 

RR/AA 

I I 

171 172 213 



BNSDOCID: <WO Q3051917A2J_> 



WO 03/051917 



16/21 



PCT/EP02/14027 



Figure 13 



CCR7 -binding SLC-specific Two hybrid bait In vitro binding 

domain basic extension THAP1 to GST-THAP1 

SLC/CCL21 I I j + + 

Wt 24 102 134 

slc/ccl2i 1 77 gz • • nn I + + 

ACOOH 24 102 



BNSDOCID: <WO 0305191 7A2J_> 



WO 03/051917 



17/21 



PCT/EP02/14027 




BNSDOCID: <WO 03051 91 7A2_L> 



WO 03/051917 



18/21 



PCT/EP02/14027 




in 



BNSDOCID: <WO 03051 91 7A2_I_> 



WO 03/051917 



19/21 



PCT/EP02/14027 



FIGURE" 1 6 



THAP1 wt 
THAP1-N1 
THAP1-N2 
THAP1-N3 
THAP1-C1 
THAP1-C2 
THAP1-C3 



THAP dom ain PRO 

m 



NLS 




A(QRCRR) 1 



THAP1 
RR/AA 



120 



213 



Two hybrid bait 
THAP1 

+ 



90 110 



90 110 



213 



213 



J 



143 213 
QR CRR) 



168 172 213 
RR/AA 



1 1 



171172 213 



+ 
+ 
+ 
+ 



WT N1 N2 N3 CI C2 C3 KDa 



D6pot10% 



GST 



3 



GST-THAP1 




BNSDOCID: <WO 03051917A2_I_> 



WO 03/051917 



20/21 



PCT/EP02/14027 



B 




250 pb 



ALIG{:'04» 

I 



ADNC THAP1 

Form a 



exon 1 



: exon 2 



AUG '561. £30) S(843) 

4-4- L 



exon 3 



M M M 



□ THAP1 



90 120143 



213 



ADNc THAP1 Form b 



exon 1 J 



(204) f(363) 

AUG [AUG (365.434) S(647) 
■I — i — 1 — 1 — ■ 1" 



exon 3 



2173 nt 

ZD 



1978 nt 



94 



THAP1 C2 



□ THAP1 C3 



71 



a. 
u. 
O 



75- 



50- 



35- 



30- 



25 -; 






M 
















-H 


I 


GFP I 


A — 


1 


90 


120 143 


213 







94 



71 



JTHAP1 
C2 



3THAP1 
C3 



FIG. 17 



BNSDOCID: <WO 03051 91 7A2J_> 



WO 03/051917 



21/21 



PCT/EP02/14027 





< 










m 
i 

CL 




Co* 




c 




o 


c 








• 




t \ 






>399 







Eh O 



Eh 



Eh 
CD 



a a 

cd cd 

o o 

a o 



Eh Eh 



£ § 3 



f2 

a o 

o CD 

a cd 

cd cd 



5 3 

o u 

o o 

Eh O 

Eh Eh 



CD O 



d 



^ ^ ^ 

o a a 

cd cd cd 

CD CD CD 

a Eh Eh 



s 



Eh 



CD 
Eh 



o a 

CD CD 

CD CD 

CD CD 



° 9 

3 3 3 

a o o 

CD CD CD 

CD CD CD 

CD CD CD 



£ 3 

a o 

o ^ 

CD CD 

CD CD 



Eh 
0 
Eh 



52 



CD CD 

< CD 

«! Eh 

B 8 

6 8 

a a 



d 3 



IS 










§ 




CD 






a 


CD 


CD 


a 


CD 


CD 


CD 


CD 


CD 


O 








Eh 




s 


CD 




Eh 


Eh 




C5 


CD 




S2 












gs 


CT 


< 


E* 




CD 


g 


g 










O 


o 


CD 


d 


d 








d 




o 


a 


CD 


8 










O 





cd 

CD 
CD 



Eh 
CD 

I 

S3 



d 

a 

8 

E< 



5 § 

a o 

8 § 

CD CD 



^ Eh 

Eh Eh 

a eg 

Eh Eh 

O Eh 

O CD 



IS 



CD O 



O O 

O U 

g » 
Eh 



Eh 
Eh 



CD 

a o 



o 

CD 



Eh 
CD 
Eh 

52 

I 
I 

CD 



Eh 
CD 



CD CD 



O 
52 

gs 

CD 
CD 
Eh 
CD 
O 



a 



8 3 

8 

a 



i 



BNSDOCID: <WO 03051917A2J_> 



WO 03/051917 



PCT/EP02/14027 



SEQUENCE LISTING 

<110> Endocube SAS 

Centre National de la Recherche Scientifique (CNRS) 

Girard, Jean-Philippe 
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Kossida, Sophia 

Amalric, Francois 

Clouaire, Thomas 

<120> NOVEL DEATH ASSOCIATED PROTEINS, AND 

THAP1 AND PAR 4 PATHWAYS IN APOPTOSIS CONTROL 



<130> G3138 PCT (BIOBANK. 009VPC ) 

<140> Unknown 
<141> 2002-12-10 

<150> US 60/341,997 
<151> 2001-12-18 

<160> 263 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 74 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> TRAP domain consensus 
<221> UNSURE 

<222> 2-5, 7-21, 23-31, 33-49, 51-52, 55-73 
<223> Xaa = any of the twenty amino acids 

<400> 1 

Cys Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

15 10 15 

Xaa Xaa Xaa Xaa Xaa Pro Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Trp 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

Xaa Cys Xaa Xaa His Phe Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 60 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Pro 
65 70 



<210> 2 
<211> 81 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> THAP domain consensus 
<221> UNSURE 
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74-75, 


80 














<223> Xaa - 


any 


of 


the 


twenty amino 


<4U0> 2 
















Met Pro 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


1 






5 










Xaa Xaa 


Xaa 


Xaa 


Xaa 


Phe 


His 


Xaa 


Phe 






20 










25 


Xaa Xaa 


Xaa 


Trp 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




35 










40 




Xaa Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Ser 


Xaa 


His 


50 










55 






Xaa Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Leu 


Lys 


65 








70 






Phe 

















10 15 
Pro Xaa Xaa Xaa Xaa Xaa Xa< 
30 

Arg Xaa Xaa Xaa Xaa Xaa Xai 
45 

Phe Xaa Xaa Xaa Xaa Phe Xa< 
60 

Xaa Xaa Ala Val Pro Thr Xat 
75 80 



<210> 3 






<211> 213 




<212> PRT 




<213> Homo , 


sapi< 


<400> 3 






Met 


Val 


Gin 


Ser 


1 








Asp 


Lys 


Pro 


Val 








20 


Cys 


Lys 


Glu 


Trp 






35 




Lys 


Tyr 


Ser 


Ser 




50 






Arg 


Glu 


Cys 


Asn 


65 








Phe 


Leu 


Cys 


Thr 


Gin 


Glu 


Gin 


Leu 








100 


Asp 


Ala 


Ala 


He 






115 




Leu 


Ser 


Val 


Phe 




130 






Gin 


Arg 


Lys Arg 


145 








Lys 


Lys 


Leu 


Lys 


Leu 


Glu 


Lys 


Leu 








180 


Val 


Ser 


Glu Arg 






195 




Val 


Glu 


Val 


Pro 




210 







5 10 15 

Ser Phe His Lys Phe Pro Leu Thr Arg Pro Ser Leu 

25 30 
Glu Ala Ala Val Arg Arg Lys Asn Phe Lys Pro Thr 

40 45 
He Cys Ser Glu His Phe Thr Pro Asp Cys Phe Lys 

55 60 
Asn Lys Leu Leu Lys Glu Asn Ala Val Pro Thr lie 

70 75 80 

Glu Pro His Asp Lys Lys Glu Asp Leu Leu Glu Pro 
85 90 95 

Pro Pro Pro Pro Leu Pro Pro Pro Val Ser Gin Val 

105 no 
Gly Leu Leu Met Pro Pro Leu Gin Thr Pro Val Asn 

120 125 
Cys Asp His Asn Tyr Thr Val Glu Asp Thr Met His 

135 140 
lie His Gin Leu Glu Gin Gin Val Glu Lys Leu Arg 
150 155 " 160 

Thr Ala Gin Gin Arg Cys Arg Arg Gin Glu Arg Gin 
165 170 - 175 

Lys Glu Val Val His Phe Gin Lys Glu Lys Asp Asp 

185 190 
Gly Tyr Val He Leu Pro Asn Asp Tyr Phe Glu lie 
200 205 



<210> 4 
<211> 228 
<212> PRT 

<213> Homo sapiens 
<400> 4 
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Met Pro Thr Asn Cys Ala Ala Ala Gly Cys Ala Thr Thr Tyr Asn Lys 
1 5 10 15 

His He Asn He Ser Phe His Arg Phe Pro Leu Asp Pro Lys Arg Arg 

20 25 30 

Lys Glu Trp Val Arg Leu Val Arg Arg Lys Asn Phe Val Pro Gly Lys 

35 40 45 

His Thr Phe Leu Cys Ser Lys His Phe Glu Ala Ser Cys Phe Asp Leu 

50 " 55 60 

Thr Gly Gin Thr Arg Arg Leu Lys Met Asp Ala Val Pro Thr He Phe 
65 70 75 80 

Asp Phe Cys Thr His He Lys Ser Met Lys Leu Lys Ser Arg Asn Leu 

85 90 95 

Leu Lys Lys Asn Asn Ser Cys Ser Pro Ala Gly Pro Ser Asn Leu Lys 

100 " 105 110 

Ser Asn He Ser Ser Gin Gin Val Leu Leu Glu His Ser Tyr Ala Phe 

115 120 125 

Arg Asn Pro Met Glu Ala Lys Lys Arg He He Lys Leu Glu Lys Glu 

130 135 140 

lie Ala Ser Leu Arg Arg Lys Met Lys Thr Cys Leu Gin Lys Glu Arg 
145 150 155 160 

Arg Ala Thr Arg Arg Trp He Lys Ala Thr Cys Leu Val Lys Asn Leu 

165 170 175 

Glu Ala Asn Ser Val Leu Pro Lys Gly Thr Ser Glu His Met Leu Pro 

180 185 190 

Thr Ala Leu Ser Ser Leu Pro Leu Glu Asp Phe Lys He Leu Glu Gin 

195 200 205 

Asp Gin Gin Asp Lys Thr Leu Leu Ser Leu Asn Leu Lys Gin Thr Lys 

210 215 220 

Ser Thr Phe He 
225 



<210> 5 


























<211> 239 
























<212> PRT 
























<213> Homo sapiens 




















<400> 5 


























Met Pro 


Lys 


Ser 


Cys 


Ala 


Ala 


Arg 


Gin 


Cys 


Cys 


Asn Arg 


Tyr 


Ser Ser 


1 




5 










10 








15 


Arg Arg 


Lys 


Gin 


Leu 


Thr 


Phe 


His 


Arg 


Phe 


Pro 


Phe Ser Arg 


Pro Glu 






20 










25 








30 




Leu Leu 


Lys 


Glu 


Trp 


Val 


Leu 


Asn 


He 


Gly Arg 


Gly Asn 


Phe 


Lys Pro 




35 










40 








45 






Lys Gin 


His 


Thr 


Val 


He 


Cys 


Ser 


Glu 


His 


Phe 


Arg Pro 


Glu 


Cys Phe 


50 










55 










60 






Ser Ala 


Phe 


Gly 


Asn 


Arg 


Lys 


Asn 


Leu 


Lys 


His 


Asn Ala 


Val 


Pro Thr 


65- 








70 








75 






80 


Val Phe 


Ala 


Phe 


Gin Asp 


Pro 


Thr 


Gin 


Gin 


Val 


Arg Glu 


Asn 


Thr Asp 








85 










90 








95 


Pro Ala 


Ser 


Glu 


Arg 


Gly Asn 


Ala 


Ser 


Ser 


Ser 


Gin Lys 


Glu 


Lys Val 






100 










105 








110 




Leu Pro 


Glu 


Ala 


Gly Ala 


Gly 


Glu 


Asp 


Ser 


Pro 


Gly Arg Asn Met Asp 




115 










120 








125 






Thr Ala 


Leu 


Glu 


Glu 


Leu 


Gin 


Leu 


Pro 


Pro 


Asn 


Ala Glu 


Gly 


His Val 


130 










135 










140 






Lys Gin 


Val 


Ser 


Pro 


Arg 


Arg 


Pro 


Gin 


Ala 


Thr 


Glu Ala 


Val 


Gly Arg 


145 








150 










155 






160 


Pro Thr 


Gly 


Pro 


Ala 


Gly Leu Arg Arg 


Thr 


Pro 


Asn Lys 


Gin 


Pro Ser 








165 










170 








175 


Asp His 


Ser 


Tyr 


Ala 


Leu 


Leu Asp 


Leu Asp. Ser 


Leu Lys 


Lys 


Lys Leu 
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180 

Phe Leu Thr Leu Lys Glu Asn Glu 
195 200 
Gin Arg Leu Val Met Arg Arg Met 

210 215 
Gly His Gin Gly Leu Gin Ala Arg 
225 230 



185 190 
Lys Leu Arg Lys Arg Leu Gin Ala 
205 

Ser Ser Arg Leu Arg Ala Cys Lys 

220 

Leu Gly Pro Glu Gin Gin Ser 
235 



<210> 6 

<211> 577 

<212> PRT 

<213> Homo sapiens 



<400> 6 



Met 


Val 


lie 


Cys 


Cys 


Ala 


Ala 


Val 


Asn 


Cys 


Ser 


Asn Arg 


Gin 


Gly 


Lys 


1 








5 










10 










15 




Gly 


Glu 


Lys 


Arg 
20 


Ala 


Val 


Ser 


Phe 


His 
25 


Arg 


Phe 


Pro 


Leu 


Lys 
30 


Asp 


Ser 


Lys 


Arg 


Leu 
35 


He 


Gin 


Trp 


Leu 


Lys 
40 


Ala 


Val 


Gin 


Arg 


Asp 
45 


Asn 


Trp 


Thr 


Pro 


Thr 
50 


Lys 


Tyr 


Ser 


Phe 


Leu 
55 


Cys 


Ser 


Glu 


His 


Phe 
60 


Thr 


Lys 


Asp 


Ser 


Phe 


Ser 


Lys 


Arg 


Leu 


Glu 


Asp 


Gin 


His 


Arg 


Leu 


Leu 


Lys 


Pro 


Thr 


Ala 


65 










70 










75 










80 


Val 


Pro 


Ser 


He 


Phe 


His- 


Leu 


Thr 


Glu 


Lys 


Lys 


Arg Gly 


Ala 


Gly 


Gly 










85 










90 










95 




His 


Gly 


Arg 


Thr 


Arg Arg 


Lys 


Asp 


Ala 


Ser 


Lys 


Ala 


Thr 


Gly 


Gly 


Val 








100 










105 










110 






Arg 


Gly 


His 
115 


Ser 


Ser 


Ala 


Ala 


Thr 
120 


Gly 


Arg 


Gly 


Ala 


Ala 
125 


Gly 


Trp 


Ser 


Pro 


Ser 
130 


Ser 


Ser 


Gly 


Asn 


Pro 
135 


Met 


Ala 


Lys 


Pro 


Glu 
140 


Ser 


Arg 


Arg 


Leu 


Lys 


Gin 


Ala 


Ala 


Leu 


Gin 


Gly 


Glu 


Ala 


Thr 


Pro 


Arg Ala 


Ala 


Gin 


Glu 


145 










150 










155 










160 


Ala 


Ala 


Ser 


Gin 


Glu 


Gin 


Ala 


Gin 


Gin 


Ala 


Leu 


Glu Arg 


Thr 


Pro 


Gly 










165 










170 










175 




Asp 


Gly 


Leu 




Thr 


Met 


XT — T 

val 


A_La 


Gly 


Ser 


Gin 


Gly Lys 


Ala 


Glu 


Ala 








1 oU 










IOC 

185 










190 






Ser 


Ala 


Thr 
195 


Asp 


Ala 


Gly 


Asp 


Glu 
200 




iU. CL 


X 11JL 


Ser 


Ser 
205 


Tip 


VJ-L Li 




Gly Val 


Thr 


Asp 


Lys 


Ser 


Gly 


He 


Ser 


Met 


Asp 


Asp 


Phe 


Thr 


Pro 


Pro 




210 










215 










220 










Gly 


Ser 


Gly 


Ala 


Cys 


Lys 


Phe 


He 


Gly 


Ser 


Leu 


His 


Ser 


Tyr 


Ser 


Phe 


225 










230 










235 










240 


Ser 


Ser 


Lys 


His 


Thr Arg 


Glu 


Arg 


Pro 


Ser 


Val 


Pro Arg 


Glu 


Pro 


He 










245 










250 










255 




Asp 


Arg 


Lys 


Arg 
260 


Leu 


Lys 


Lys 


Asp 


Val 
265 


Glu 


Pro 


Ser 


Cys 


Ser 
270 


Gly 


Ser 


Ser 


Leu 


Gly 
275 


Pro 


Asp 


Lys 


Gly 


Leu 

280 


Ala 


Gin 


Ser 


Pro 


Pro 
285 


Ser 


Ser 


Ser 


Leu 


Thr 
290 


Ala 


Thr 


Pro 


Gin 


Lys 
295 


Pro 


Ser 


Gin 


Ser 


Pro 
300 


Ser 


Ala 


Pro 


Pro 


Ala 


Asp 


Val 


Thr 


Pro 


Lys 


Pro 


Ala 


Thr 


Glu 


Ala 


Val 


Gin 


Ser 


Glu 


His 


305 










310 










315 










320 


Ser Asp 


Ala 


Ser 


Pro 


Met 


Ser 


He 


Asn 


Glu 


Val 


He 


Leu 


Ser 


Ala 


Ser 










325 










330 










335 




Gly Ala 


Cys 


Lys 


Leu 


He 


Asp 


Ser 


Leu 


His 


Ser 


Tyr 


Cys 


Phe 


Ser 


Ser 








340 










345 










350 






Arg 


Gin 


Asn 


Lys 


Ser 


Gin 


Val 


Cys 


Cys 


Leu 


Arg 


Glu 


Gin 


Val 


Glu 


Lys 



355 360 365 
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Lys Asn Gly Glu Leu Lys Ser Leu Arg Gin Arg Val Ser Arg Ser Asp 
370 ~ 375 380 

Ser Gin Val Arg Lys Leu Gin Glu Lys Leu Asp Glu Leu Arg Arg Val 
385 390 395 400 

Ser Val Pro Tyr Pro Ser Ser Leu Leu Ser Pro Ser Arg Glu Pro Pro 

405 410 415 

Lys Met Asn Pro Val Val Glu Pro Leu Ser Trp Met Leu Gly Thr Trp 

420 425 430 

Leu Ser Asp Pro Pro Gly Ala Gly Thr Tyr Pro Thr Leu Gin Pro Phe 

435 440 445 

Gin Tyr Leu Glu Glu Val His lie Ser His Val Gly Gin Pro Met Leu 

450 455 460 

Asn Phe Ser Phe Asn Ser Phe His Pro Asp Thr Arg Lys Pro Met His 
465 470 475 480 

Arg Glu Cys Gly Phe He Arg Leu Lys Pro Asp Thr Asn Lys Val Ala 

4 85 4 90 4 95 

Phe Val Ser Ala Gin Asn Thr Gly Val Val Glu Val Glu Glu Gly Glu 

500 505 510 

Val Asn Gly Gin Glu Leu Cys He Ala Ser His Ser He Ala Arg He 

515 520 525 

Ser Phe Ala Lys Glu Pro His Val Glu Gin He Thr Arg Lys Phe Arg 

530 535 540 

Leu Asn Ser Glu Gly Lys Leu Glu Gin Thr Val Ser Met Ala Thr Thr 
545 550 555 560 

Thr Gin Pro Met Thr Gin His Leu His Val Thr Tyr Lys Lys Val Thr 
565 570 575 

Pro 



<210> 7 


















<211> 395 
















<212> PRT 
















<213> Homo sapiens 












<400> 7 
















Gly Arg 


Met Pro 


Arg 


Tyr 


Cys 


Ala Ala 


He 


Cys Cys 


Lys Asn Arg Arg 


1 


5 






10 




15 


Asn Asn 


Lys Asp Arg 


Lys Leu 


Ser 


Phe Tyr 


Pro Phe Pro Leu 


His Asp 






20 








25 


30 




Lys Glu 


Arg 


Leu 


Glu 


Lys Trp 


Leu 


Lys Asn 


Met Lys Arg Asp 


Ser Trp 


35 








40 




45 




Val Pro 


Ser 


Lys 


Tyr 


Gin Phe 


Leu 


Cys Ser 


Asp His Phe Thr 


Pro Asp 


50 








55 






60 




Ser Leu Asp 


He Arg 


Trp Gly 


He Arg Tyr 


Leu Lys Gin Thr 


Ala Val 


65 








70 






75 


80 


Pro Thr 


He 


Phe 


Ser 


Leu Pro 


Glu Asp Asn 


Gin Gly Lys Asp 


Pro Ser 








85 






90 




95 


Lys Lys 


Lys 


Ser 


Gin 


Lys Lys 


Asn 


Leu Glu Asp Glu Lys Glu 


Val Cys 




100 








105 


110 




Pro Lys 


Ala 


Lys 


Ser 


Glu Glu 


Ser 


Phe Val 


Leu Asn Glu Thr 


Lys Lys 


115 








120 




125 




Asn He 


Val 


Asn 


Thr 


Asp Val 


Pro 


His Gin 


His Pro Glu Leu 


Leu His 


130 








135 






140 




Ser Ser 


Ser 


Leu 


Val 


Lys Pro 


Pro 


Ala Pro 


Lys Thr Gly Ser 


He Gin 


145 








150 ' 






155 


160 


Asn Asn 


Met 


Leu 


Thr 


Leu Asn 


Leu 


Val Lys 


Gin His Thr Gly 


Lys Pro 








165 






170 




175 


Glu Ser 


Thr 


Leu 


Glu 


Thr Ser 


Val 


Asn Gin Asp Thr Gly Arg Gly Gly 






180 








185 


190 




Phe His 


Thr 


Cys 


Phe 


Glu Asn 


Leu 


Asn Ser 


Thr Thr He Thr 


Leu Thr 
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195 200 205 

Thr Ser Asn Ser Gl-u Ser He His Gin Ser Leu Glu Thr Gin Glu Val 

215 220 
Leu Glu Val Thr Thr Ser His Leu Ala Asn Pro Asn Phe Thr Ser Asn 

230 235 240 

Ser Met Glu He Lys Ser Ala Gin Glu Asn Pro Phe Leu Phe Ser Thr 

245 250 255 

He Asn Gin Thr Val Glu Glu Leu Asn Thr Asn Lys Glu Ser Val He 

260 265 270 

Ala lie Phe Val Pro Ala Glu Asn Ser Lys Pro Ser Val Asn Ser Phe 
275 280 285 

116 otn Ma Gln LyS G1U Thr Thr Glu Met Glu Thr Asp He Glu 

295 300 
Asp Ser Leu Tyr Lys Asp Val Asp Tyr Gly Thr Glu Val Leu Gin He 

310 315 
Glu His Ser Tyr Cys Arg Gin Asp He Asn Lys Glu His Leu Trp Gin 

325 330 335 

Lys Val Ser Lys Leu His Ser Lys He Thr Leu Leu Glu Leu Lys Glu 

340 34 5 350 

Gin Gin Thr Leu Gly Arg Leu Lys Ser Leu Glu Ala Leu He Arg Gin 

360 365 
Leu Lys Gin Glu Asn Trp Leu Ser Glu Glu Asn Val Lys He He Glu 

375 380 
Asn His Phe Thr Thr Tyr Glu Val Thr Met He 
385 390 395 



<210> 8 

<211> 222 

<212> PRT 

<213> Homo sapiens 

<400> 8 

Met Val Lys Cys Cys Ser Ala He Gly Cys Ala Ser Arg Cys Leu Pro 

Asn Ser Lys Leu Lys Gly Leu Thr Phe His Val Phe Pro Thr Hp Glu 
20 25 30 

Asn He Lys Arg Lys Trp Val Leu Ala Met Lys Arg Leu Asp Val Asn 
35 4 0 45 

Ala Ala Gly He Trp Glu Pro Lys Lys Gly Asp Val Leu Cys Ser Arg 

_ 55 go 

His Phe Lys Lys Thr Asp Phe Asp Arg Ser Ala Pro Asn He Lys Leu 

Lys Pro Gly Val lie Pro Ser lie Phe Asp Ser Pro Tyr His Leu Gin 

90 95 
Gly Lys Arg Glu Lys Leu His Cys Arg Lys Asn Phe Thr Leu Lys Thr 

100 105 HO 

Val Pro Ala Thr Asn Tyr Asn His His Leu Val Gly Ala Ser Ser Cys 

115 120 125 

He Glu Glu Phe Gin Ser Gin Phe He Phe Glu His Ser Tyr Ser Val 

Met Asp Ser Pro Lys Lys Leu Lys His Lys Leu Asp His Val He Glv 

Glu Leu Glu Asp Thr Lys Glu Ser Leu Arg Asn Val Leu Asp Arg Glu 

165 17 0 175 

Lys Arg Phe Gin Lys Ser Leu Arg Lys Thr He Arg Glu Leu Lys Aso 

180 185 190 P 

Glu Cys Leu He Ser Gin Glu Thr Ala Asn Arg Leu Asp Thr Phe Cys 

iy; > 200 ?05 

Trp Asp Cys Cys Gin Glu Ser He Glu Gin Asp Tyr He Ser 
21° 215 220 
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<210> 9 
<211> 309 
<212> PRT 

<213> Homo sapiens 
<400> 9 

Met Pro Arg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 

15 10 15 

Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 

20 25 30 

Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn Cys Gin Arg Leu Asp Pro 

35 " 40 45 

Ser Gly Gin Gly Leu Trp Asp Pro Ala Ser Glu Tyr He Tyr Phe Cys 

50 55 60 

Ser Lys His Phe Glu Glu Asp Cys Phe Glu Leu Val Gly He Ser Gly 
65 70 75 80 

Tvr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 

85 90 95 

Ser Lys Leu Arg Arg Thr Thr Lys Thr Lys Gly His Ser Tyr Pro Pro 

100 105 HO 

Gly Pro Pro Glu Val Ser Arg Leu Arg Arg Cys Arg Lys Arg Cys Ser 

115 120 125 

Glu Gly Arg Gly Pro Thr Thr Pro Phe Ser Pro Pro Pro Pro Ala Asp 

130 ~ *" 135 140 

Val Thr Cys Phe Pro Val Glu Glu Ala Ser Ala Pro Ala Thr Leu Pro 
145 J 150 155 160 

Ala Ser Pro Ala Gly Arg Leu Glu Pro Gly Leu Ser Ser Pro Phe Ser 

165 170 175 

Asp Leu Leu Gly Pro Leu Gly Ala Gin Ala Asp Glu Ala Gly Cys Ser 

180 185 190 

Ala Gin Pro Ser Pro Glu Arg Gin Pro Ser Pro Leu Glu Pro Arg Pro 

195 200 205 

Val Ser Pro Ser Ala Tyr Met Leu Arg Leu Pro Pro Pro Ala Gly Ala 

210 215 220 

Tyr He Gin Asn Glu His Ser Tyr Gin Val Gly Ser Ala Leu Leu Trp 
225 230 235 240 

Lys Arg Arg Ala Glu Ala Ala Leu Asp Ala Leu Asp Lys Ala Gin Arg 

245 250 255 

Gin Leu Gin Ala Cys Lys Arg Arg Glu Gin Arg Leu Arg Leu Arg Leu 

260 265 270 

Thr Lys Leu Gin Gin Glu Arg Ala Arg Glu Lys Arg Ala Gin Ala Asp 

275 280 285 

Ala Arg Gin Thr Leu Lys Glu His Val Gin Asp Phe Ala Met Gin Leu 

290 295 300 

Ser Ser Ser Met Ala 
305 



<210> 10 

<211> 274 

<212> PRT 

<213> Homo sapiens 

<400> 10 

Met Pro Lys Tyr Cys Arg Ala Pro Asn Cys Ser Asn Thr Ala Gly Arg 

15 10 15 

Leu Gly Ala Asp Asn Arg Pro Val Ser Phe Tyr Lys Phe Pro Leu Lys 

20 25 30 

Asp Gly Pro Arg Leu Gin Ala Trp Leu Gin His Met Gly Cys Glu His 
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35 40 45 

Trp Val Pro Ser Cys His Gin His Leu Cys Ser Glu His Phe Thr Pro 
=»U 55 60 

Ser Cys Phe Gin Trp Arg Trp Gly Val Arg Tyr Leu Arg Pro Asp Ala 

70 75 eo 

Val Pro Ser He Phe Ser Arg Gly Pro Pro Ala Lys Ser Gin Arg Arg 

85 90 95 

Thr Arg Ser Thr Gin Lys Pro Val Ser Pro Pro Pro Pro Leu Gin Lys 

100 105 HO 

Asn Thr Pro Leu Pro Gin Ser Pro Ala He Pro Val Ser Gly Pro Val 

115 120 125 

Arg Leu Val Val Leu Gly Pro Thr Ser Gly Ser Pro Lys Thr Val Ala 

Thr Met Leu Leu Thr Pro Leu Ala Pro Ala Pro Thr Pro Glu Arg Ser 

155 160 
Gin Pro Glu Val Pro Ala Gin Gin Ala Gin Thr Gly Leu Gly Pro Val 

165 170 ~ 175 

Leu Gly Ala Leu Gin Arg Arg Val Arg Arg Leu Gin Arg Cys Gin Glu 

18° 185 190 

Arg H 1S Gin Ala Gin Leu Gin Ala Leu Glu Arg Leu Ala Gin Gin Leu 

195 200 ~ 205 

His Gly Glu Ser Leu Leu Ala Arg Ala Arg Arg Gly Leu Gin Arg Leu 

Thr Thr Ala Gin Thr Leu Gly Pro Glu Glu Ser Gin Thr Phe Thr He 

235 240 
He Cys Gly Gly Pro Asp He Ala Met Val Leu Ala Gin Asp Pro Ala 

245 250 255 

Pro Ala Thr Val Asp Ala Lys Pro Glu Leu Leu Asp Thr Arg He Pro 

260 265 " 270 

Ser Ala z/u 



<210> 11 
<211> 903 
<212> PRT 

<213> Homo sapiens 
<400> 11 

Met Thr Arg Ser Cys Ser Ala Val Gly Cys Ser Thr Arg Asp Thr Val 

5 10 15 

Leu Ser Arg Glu Arg Gly Leu Ser Phe His Gin Phe Pro Thr Asp Thr 

20 25 30 

He Gin Arg Ser Lys Trp He Arg Ala Val Asn Arg Val Asp Pro Arc 

35 40 45 

Ser Lys Lys He Trp He Pro Gly Pro Gly Ala He Leu Cys Ser Lys 

=> u 55 60 

His Phe Gin Glu Ser Asp Phe Glu Ser Tyr Gly He Arg Arg Lys Leu 

7 0 75 80 

Lys Lys Gly Ala Val Pro Ser Val Ser Leu Tyr Lys He Pro Gin Glv 

85 90 95 

Val His Leu Lys Gly Lys Ala Arg Gin Lys He Leu Lys Gin Pro Leu 

100 105 HO 

Pro Asp Asn Ser Gin Glu Val Ala Thr Glu Asp His Asn Tyr Ser Leu 

115 120 125 

Lys Thr Pro Leu Thr He 'Gly Ala Glu Lys Leu Ala Glu Val Gin Gin 

13° 135 140 

Met Leu Gin Val Ser Lys Lys Arg Leu He Ser Val Lys Asn Tyr Ara 
145 I 50 155 160 

Met He Lys Lys Arg Lys Gly Leu Arg Leu He Asp Ala Leu Val Glu 
165 170 175 
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Glu Lys Leu Leu Ser Glu Glu Thr Glu Cys Leu Leu Arg Ala Gin Phe 
180 185 190 

Ser Asp Phe Lys Trp Glu Leu Tyr Asn Trp Arg Glu Thr Asp Glu Tyr 

195 200 205 

Ser Ala Glu Met Lys Gin Phe Ala Cys Thr Leu Tyr Leu Cys Ser Ser 

210 215 220 

Lys Val Tyr Asp Tyr Val Arg Lys lie Leu Lys Leu Pro His Ser Ser 
225 230 " ' 235 240 

He Leu Arg Thr Trp Leu Ser Lys Cys Gin Pro Ser Pro Gly Phe Asn 

245 250 255 

Ser Asn He Phe Ser Phe Leu Gin Arg Arg Val Glu Asn Gly Asp Gin 

260 265 270 

Leu Tyr Gin Tyr Cys Ser Leu Leu He Lys Ser He Pro Leu Lys Gin 

275 280 285 

Gin Leu Gin Trp Asp Pro Ser Ser His Ser Leu Gin Gly Phe Met Asp 

290 295 300 

Phe Gly Leu Gly Lys Leu Asp Ala Asp Glu Thr Pro Leu Ala Ser Glu 
305 310 315 320 

Thr Val Leu Leu Met Ala Val Gly He Phe Gly His Trp Arg Thr Pro 

325 330 335 

Leu Gly Tyr Phe Phe Val Asn Arg Ala Ser Gly Tyr Leu Gin Ala Gin 

340 345 350 

Leu Leu Arg Leu Thr He Gly Lys Leu Ser Asp He Gly He Thr Val 

355 360 365 

Leu Ala Val Thr Ser Asp Ala Thr Ala His Ser Val Gin Met Ala Lys 

370 375 380 

Ala Leu Gly He His He Asp Gly Asp Asp Met Lys Cys Thr Phe Gin 
385 ^ 390 395 400 

His Pro Ser Ser Ser Ser Gin Gin He Ala Tyr Phe Phe Asp Ser Cys 

405 410 415 

His Leu Leu Arg Leu He Arg Asn Ala Phe Gin Asn Phe Gin Ser He 

420 425 430 

Gin Phe He Asn Gly He Ala His Trp Gin His Leu Val Glu Leu Val 

435 440 445 

Ala Leu Glu Glu Gin Glu Leu Ser Asn Met Glu Arg He Pro Ser Thr 

450 455 460 

Leu Ala Asn Leu Lys Asn His Val Leu Lys Val Asn Ser Ala Thr Gin 
465 470 475 480 

Leu Phe Ser Glu Ser Val Ala Ser Ala Leu Glu Tyr Leu Leu Ser Leu 

485 490 495 

Asp Leu Pro Pro Phe Gin Asn Cys He Gly Thr He His Phe Leu Arg 

500 505 510 

Leu He Asn Asn Leu Phe Asp He Phe Asn Ser Arg Asn Cys Tyr Gly 

515 520 525 

Lys Gly Leu Lys Gly Pro Leu Leu Pro Glu Thr Tyr Ser Lys He Asn 

530 ~ 535 540 

His Val Leu He Glu Ala Lys Thr He Phe Val Thr Leu Ser Asp Thr 
545 550 555 560 

Ser Asn Asn Gin He He Lys Gly Lys Gin Lys Leu Gly Phe Leu Gly 

565 570 575 

Phe Leu Leu Asn Ala Glu Ser Leu Lys Trp Leu Tyr Gin Asn Tyr Val 

580 585 590 

Phe Pro Lys Val Met Pro Phe Pro Tyr Leu Leu Thr Tyr Lys Phe Ser 

595 600 605 

His Asp His Leu Glu Leu Phe Leu Lys Met Leu Arg Gin Val Leu Val 

610 615 620 

Thr Ser Ser Ser Pro Thr Cys Met Ala Phe Gin Lys Ala Tyr Tyr Asn 
625 630 635 640 

Leu Glu Thr Arg Tyr Lys Phe Gin Asp Glu Val Phe Leu Ser Lys Val 

645 650 655 

Ser He Phe Asp He Ser He Ala Arg Arg Lys Asp Leu Ala Leu Trp 
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660 # 665 670 

Thr Val Gin Arg Gin Tyr Gly Val Ser Val Thr Lys Thr Val Phe His 
675 680 685 



Glu 


Glu 


Gly 


He 


Cys 


Gin 


Asp 


Trp 


Ser 


His 


Cys 


Ser 


Leu 


Ser 


Glu 


Ala 




690 










695 








700 










Leu 


Leu 


Aso 


Leu 


Ser 


Asp 


His 


Arg 


Arg 


Asn 


XJCJ LX 


IXC 




Tyr Ala 


Gly 


705 










710 










71 R 










7 90 


Tvr 


Val 


Ala 


Asn 


Lys 
725 


Leu 


Ser 


Ala 


Leu 








O-L Li 


Asp 


Cys 
735 


-L lc 


Thr 


Ala 


Leu 


Tvr 
j 

740 


Ala 


Ser 


Asp 


Leu 


Lys 
74 S 


Ala 


v- 




lie 


Gly 

/ DU 


Ser 


Leu 


Leu 


Phe 


Val 
755 


Lys 


Lys 


Lys 


Asn 


Gly 
760 


Leu 


His 


Phe 


Pro 


Ser 

ice; 
/ Oj 


Glu 


Ser 


Leu 


Cys 


Arg 


Val 


He 


Asn 


He 


Cys 


Glu 


ni y 


Vpi 1 

V d_L 


v ci J_ 


A rrr 
jtV-L. y 


Th y- 
j. ill. 


His 


Ser 


r\JL y 




770 










775 










7ft n 








Met 


Ala 


lie 


Phe 


Glu 


Leu 


Val 








XT-L y 


vjjJL U. 


Leu 


Tyr 


Leu 


Gin 


785 










790 










7QR 










p nn 


Gin 


Lys 


lie 


Leu 


Cys 


Glu 


Leu 


Ser 




Hi c; 
nx o 


Tip 
11c 




leu 


Phe 


Val 


flop 










805 










pin 

O J. u 










815 


Val 


Asn 


Lys 


His 


T ifai 1 

X-tKZ LI 




A *^TTi 
rio ^— ' 




m n 


V ci-L 


Cys 


Ala 


He 


Asn 
830 


His 


fne 


Val 


Lys 


Leu 


Leu 


Lys 


Asp 


He 


He 


He 


Cys 


Phe 


Leu 


Asn 


He 


Arg 


Ala 






P "3 ^ 










o 4 U 










845 






Lys 


Asn 


Val 


Ala 


Gin 


Asn 


Pro 


Leu 


Lys 


His 


His 


Ser 


Glu 


Arg 


Thr 


Asp 




850 










855 










860 








Met 


Lys 


Thr 


Leu 


Ser 


Arg 


Lys 


His 


Trp 


Ser 


Pro 


Val 


Gin 


Asp 


Tyr 


Lys 


865 










870 










875 










880 


Cys 


Ser 


Ser 


Phe 


Ala 


Asn 


Thr 


Ser 


Ser 


Lys 


Phe Arg His 


Leu 


Leu 


Ser 










885 










890 










895 




Asn 


Asp 


Gly 


Tyr 
900 


Pro 


Phe 


Lys 





















<210> 12 

<211> 257 

<212> PRT 

<213> Homo sapiens 



<400> 12 



Met 


Pro Ala Arg 


Cys 


Val 


Ala 


Ala 


1 




5 








Gly 


Lys Ser Leu 


Phe 


Arg 


Phe 


Pro 




20 










Trp 


Asp Arg Phe 


Val 


Arg 


Gly 


Cys 




35 








40 


Asp 


Arg Ser Val 


He 


Cys 


Ser 


Asp 




50 






55 




Val 


Ser Ser Val 


He 


Gin 


Lys 


Asn 


65 






70 






Leu 


Val Ala Gly 


Ala 


Val 


Pro 


Thr 






85 








Pro 


Lys Arg Gly 


Glu 


Glu 


Gly 


Asp 




100 










Gly 


Glu Leu Gin 


Ala 


Ala 


Arg 


His 




115 








120 


Ser 


Cys Thr Arg 


Pro 


Arg 


Ala 


Gly 




130 






135 




Thr 


Cys Glu Asn 


Glu 


Leu 


Val 


Gin 


145 






150 






Ser 


Asn Thr Val 


Thr 


Ser 


Val 


Pro 



165 



His 


Cys 


Gly Asn 


Thr 


Thr 


Lys 


Ser 




10 








15 




Lys 


Asp 


Arg Ala Val 


Arg 


Leu 


Leu 


25 








30 






Arg 


Ala 


Asp Trp 


Tyr 


Gly 


Gly 


Asn 








45 








His 


Phe 


Ala Pro 


Ala 


Cys 


Phe 


Asp 






60 










Leu 


Arg 


Phe Ser 


Gin 


Arg 


Leu 


Arg 






75 








80 


Leu 


His 


Arg Val 


Pro 


Ala 


Pro 


Ala 




90 








95 




Gin 


Ala 


Gly Arg Leu 


Asp 


Thr 


Arg 


105 








110 






Ser 


Glu 


Ala Ala 


Pro 


Gly 


Pro 


Val 








125 








Lys 


Gin 


Ala Ala 


Ala 


Ser 


Gin 


He 






140 










Thr 


Gin 


Pro His 


Ala 


Asp Asn 


Pro 






155 








160 


Thr 


His 


Cys Glu 


Glu 


Gly 


Pro 


Val 




170 








175 
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His Lys Ser Thr Gin lie Ser Leu Lys Arg Pro Arg His Arg Ser Val 
180 185 " 190 



Glv 


He 


Gin 


Ala Lvs 


Val 


-U jr o 


rl-La. it i it; o±y 


Lys 


Arg 


Leu 


v^ys 


Hsn A_i_a 






195 




















Thr 


Thr 


Gin 


Thr Glu 


Glu 




j.i.£J oci. rtiy 


i nr 


Ser 


Ser 


Leu 


fne ASp 




210 








215 






220 






He 


Tyr 


Ser 


Ser Asp 


Ser 


Glu 


Thr Asp Thr 


Asp 


Trp 


Asp 


He 


Lys Ser 


225 








230 






235 






240 


Glu 


Gin 


Ser 


Asp Leu 


Ser 


Tyr 


Met Ala Val 


Gin 


Val 


Lys 


Glu 


Glu Thr 








245 






250 








255 


Cys 

























<210> 13 
<211> 314 
<212> PRT 

<213> Homo sapiens 



<400> 13 



Met 


Pro 


Gly 


Phe 


Thr 


Cys 


Cys 


Val 


1 








5 








Arg 


Asp 


Lys 


Ala 


Leu 


His 


Phe 


Tyr 








20 








Arg 


Arg 


Leu 


Trp 


Leu 


Lys Asn 


Val 






35 










40 


Phe 


Ser 


Thr 


Phe 


Gin 


Pro 


Thr 


Thr 




50 










55 




Phe 


Gin 


Gly 


Gly Arg 


Lys 


Thr 


Tyr 


65 










70 






Pro 


Leu 


Arg 


Gly Val 


Asn 


Glu 


Arg 










85 








Ala 


Ala 


Ala 


Ala Arg 


Arg Arg 


Gin 








100 










Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 






115 










120 


Gin 


Gin 


Gin 


Gin 


Ser 


Ser 


Pro 


Ser 




130 










135 




Leu 


Gin 


Pro 


Asn 


Leu 


Val 


Ser 


Ala 


145 










150 






Gin 


Ala 


Thr 


Val Asp 


Ser 


Ser 


Gin 










165 








Pro 


He 


Thr 


Pro 


Thr 


Gly Glu 


Asp 








180 










Gin 


Val 


Glu 


Phe 


Ala 


Ala 


Ala 


Glu 






195 










200 


Ser 


Glu 


Leu 


Gin 


Ala 


Ala 


Thr 


Ala 




210 










215 




Met 


Gly 


Pro 


Gin 


Leu 


Val 


Val 


Val 


225 










230 






Gly 


Ser 


Asp 


His 


Ser 


Tyr 


Ser 


Leu 










245 








Leu 


Leu 


Arg 


Lys 


Leu 


Asn 


Glu 


Gin 








260 










Val 


Lys 


Met 


Lys 


Glu 


Met 


Lys 


Gly 






275 










280 


Glu 


Ala 


Lys 


Leu Arg 


Glu 


Glu 


Leu 




290 








295 




Met 


Ala 


Val 


He 


Arg 


Lys 


Lys 


His 


305 










310 







Pro Gly 


Cys 


Tyr 


Asn 


Asn 


Ser 


His 


10 










15 




Thr Phe 


Pro 


Lys 


Asp 


Ala 


Glu 


Leu 


25 








30 






Ser Arg 


Ala 


Gly 


Val 


Ser 


Gly Cys 








45 








Gly His 


Arg 


Leu 


Cys 


Ser 


Val 


His 






60 










Thr Val 


Arg Val 


Pro 


Thr 


He 


Phe 




75 










80 


Lys Val 


Ala 


Arg 


Arg 


Pro 


Ala 


Gly 


90 










95 




Gin Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


105 








110 






Gin Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 








125 








Ala Ser 


Thr 


Ala 


Gin 


Thr 


Ala 


Gin 






140 










Ser Ala 


Ala 


Val 


Leu 


Leu 


Thr 


Leu 




155 










160 


Ala Pro 


Gly 


Ser 


Val 


Gin 


Pro 


Ala 


170 










175 




Val Lys 


Pro 


He 


Asp 


Leu 


Thr 


Val 


185 








190 






Gly Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 








205 








Gly Leu 


Glu 


Ala 


Ala 


Glu 


Cys 


Pro 






220 








Gly Glu 


Glu 


Gly 


Phe 


Pro 


Asp 


Thr 




235 










240 


Ser Ser 


Gly Thr 


Thr 


Glu 


Glu 


Glu 


250 










255 




Arg Asp 


He 


Leu 


Ala 


Leu 


Met 


Glu 


265 








270 






Ser He 


Arg 


His 


Leu 


Arg 


Leu 


Thr 








285 








Arg Glu 


Lys Asp 


Arg 


Leu 


Leu 


Ala 



300 



Gly Met 
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<210> 14 
<211> 761 
<212> PRT 
<213> Homo sapiens 

<400> 14 

Met Pro Asn Phe Cys Ala Ala Pro Asn Cys Thr Arg Lys Ser Thr Gin 

Ser Asp Leu Ala Phe Phe Arg Phe Pro Irg Asp Pro Ala Arg Cys Gin 

25 30 
Lys Trp Val Glu Asn Cys Arg Arg Ala Asp Leu Glu Asp Lys Thr Pro 

Asp Gin Leu Asn Lys His Tyr Arg Leu Cys Ala Lys tits Phe Glu Thr 

55 60 
Ser Met He Cys Arg Thr Ser Pro Tyr Arg Thr Val Leu Arg Asp Asn 

Ala He Pro Thr lie Phe Asp Leu Thr Ser His Leu Asn Asn Pro His 



85 go 



Ser Arg His Arg Lys Arg He Lys Glu Leu Ser Glu Asp Glu He Arg 

105 HO 
Thr Leu Lys Gin Lys Lys He Asp Glu Thr Ser Glu Gin Glu Gin Lys 

120 125 
130 135 ^ a Gln ASn Pr ° Ser Glu Glu Glu 

Gly Glu Gly Gin Asp Glu Asp He Leu Pro Leu Thr Leu Glu Glu Lys 

50 155 i £n 

Glu Asn Lys Glu Tyr Leu Lys Ser Leu Phe Glu He Leu He Leu Met 

170 17 c 
Gly Lys Gin Asn He Pro Leu Asp Gly His Glu Ala Asp Glu He Pro 

185 190 
Glu Gly Leu Phe Thr Pro Asp Asn Phe Gin Ala Leu Leu Glu Cys Arg 

He Asn Ser Gly Glu Glu Val Leu Arg Lys Arg Phe Glu Thr Thr Ala 

Val Asn Thr Leu Phe Cys Ser Lys Thr Gin Gin Arg Gin Met Leu Glu 

235 o a n 

He Cys Glu Ser Cys He Arg Glu Glu Thr Leu Arg Glu Val Arg Asp 

250 pec 
Ser His Phe Phe Ser He He Thr Asp Asp Val Val Asp He Ala Gly 



Glu Glu His Leu Pro Val Leu Val Arg Phe Val Asp Glu Ser His Asn 

280 285 
Leu Arg Glu Glu Phe He Gly Phe Leu Pro Tyr Glu Ala Asp Ala Glu 

295 300 
lie Leu Ala Val Lys Phe His Thr Met He Thr Glu Lys Trp Gly Leu 

^ 315 n 9n 

Asn Met Glu Tyr Cys Arg Gly Gin Ala Tyr He Val Ser Ser Gly Phe 

325 330 
Ser Ser Lys Met Lys Val Val Ala Ser Arg Leu Leu Glu Lys Tyr Pro 

345 ^c;n 
Gin Ala He Tyr Thr Leu Cys Ser Ser Cys Ala Leu Asn Met Trp Leu 

360 365 
Ala Lys Ser Val Pro Val Met Gly Val Ser Val Ala Leu Gly Thr He 

380 

385 !S PhS H±S Ser Pro G1 » I*» Leu Leu Glu 

jyi) 3Q5 a nr\ 

Leu Asp Asn Val lie Ser Val Leu Phe Gin Asn Ser Lys Glu Arg Gly 

4U5 410 /JT c 

Lys Glu Leu Lys Glu He Cys His Ser Gin Trp Thr Gly Axg His Asp 
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Ala Phe Glu lie Leu Val Glu Leu Leu Gin Ala Leu Val Leu Cys Leu 
435 440 445 

Asp Gly He Asn Ser Asp Thr Asn He Arg Trp Asn Asn Tyr He Ala 

450 455 460 

Gly Arg Ala Phe Val Leu Cys Ser Ala Val Ser Asp Phe Asp Phe He 
465 " 470 475 480 

Val Thr He Val Val Leu Lys Asn Val Leu Ser Phe Thr Arg Ala Phe 

485 490 495 

Gly Lys Asn Leu Gin Gly Gin Thr Ser Asp Val Phe Phe Ala Ala Gly 

500 505 510 

Ser Leu Thr Ala Val Leu His Ser Leu Asn Glu Val Met Glu Asn He 

515 520 525 

Glu Val Tyr His Glu Phe Trp Phe Glu Glu Ala Thr Asn Leu Ala Thr 

530 535 540 

Lys Leu Asp He Gin Met Lys Leu Pro Gly Lys Phe Arg Arg Ala His 
545 1 550 555 560 

Gin Gly Asn Leu Glu Ser Gin Leu Thr Ser Glu Ser Tyr Tyr Lys Glu 

565 570 575 

Thr Leu Ser Val Pro Thr Val Glu His He He Gin Glu Leu Lys Asp 

580 585 590 

He Phe Ser Glu Gin His Leu Lys Ala Leu Lys Cys Leu Ser Leu Val 

595 600 605 

Pro Ser Val Met Gly Gin Leu Lys Phe Asn Thr Ser Glu Glu His His 

610 615 620 

Ala Asp Met Tyr Arg Ser Asp Leu Pro Asn Pro Asp Thr Leu Ser Ala 
625 ~ ~ 630 635 640 

Glu Leu His Cys Trp Arg He Lys Trp Lys His Arg Gly Lys Asp He 

645 650 655 

Glu Leu Pro Ser Thr He Tyr Glu Ala Leu His Leu Pro Asp He Lys 

660 665 670 

Phe Phe Pro Asn Val Tyr Ala Leu Leu Lys Val Leu Cys He Leu Pro 

675 " 680 685 

Val Met Lys Val Glu Asn Glu Arg Tyr Glu Asn Gly Arg Lys Arg Leu 

690 695 700 

Lys Ala Tyr Leu Arg Asn Thr Leu Thr Asp Gin Arg Ser Ser Asn Leu 
705 710 715 720 

Ala Leu Leu Asn He Asn Phe Asp He Lys His Asp Leu Asp Leu Met 

725 730 735 

Val Asp Thr Tyr He Lys Leu Tyr Thr Ser Lys Ser Glu Leu Pro Thr 

740 745 750 

Asp Asn Ser Glu Thr Val Glu Asn Thr 
755 760 



<210> 15 
<211> 38 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Consensus sequence for PAR4 binding domain of THAP 

<221> UNSURE 

<222> (1) . - . (38) 

<223> Xaa - Any Amino Acid 

<400> 15 

Leu Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

15 10 15 

Gin Arg Xaa Arg Arg Gin Xaa Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
20 ~ 25 30 
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Xaa Xaa Xaa Gin Xaa Glu 
35 



<210> 16 

<211> 73 

<212> PRT 

<213> Sus scrofa 



<400> 16 

Met val Gin Ser Cys Ser Ala Tyr Sly Cys Lys Asn Arg Tyr Asp Lys 
Asp Lys Pro Val Ser Phe His Lys Phe Pro Leu Thr Arg Pro Ser Leu 



Cys Lys Lys Trp Glu Ala Ala Val Arg Arg Lys Asn Phe Lys Pro Thr 
Lys Tyr Ser Ser He Cys Ser Glu His Phe Thr Pro Asp Cys Phe Lys 



Arg Glu Cys Asn Asn Lys Leu Leu Lvs 
65 



60 



<210> 17 
<211> 99 
<212> PRT 
<213> Sus scrofa 



70 



<400> 17 

Met Val Lys Cys Cys Ser Ala He Gly Cys Ala Ser Arg Cys Leu Pro 

Asn Ser Lys Leu Lys Gly Leu Thr Phe His Val Phe Pro Thr Asp Glu 

Lys Val Lys Arg Lys Trp Val Leu Ala Met Lys Arg Leu j£ p Val Asn 

Ala Ala Gly Met Trp Glu Pro Lys Lys Gly Asp Val Leu Cys Ser Arg 

60 

His Phe Lys Lys Thr Asp Phe Asp Arg Thr Thr Pro Asn He Lys Leu 
Lys Pro Gly Val lie Pro Ser He Phe Asp Ser Pro Ser His Leu ?L 
Gly Glu Glu ^ 90 95 



PCT/EP02/14027 



<210> 18 
<211> 103 
<212> PRT 
<213> Sus scrofa 

<400> 18 

Met Pro Arg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 
Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 
Asn Pro Arg Arg Gly Leu Trp Leu Sa Asn Cys Gin Arg Leu Asp Pro 
Ser Gly Gin Gly Leu Trp Asp Pro Ala Ser Glu Tyr iL Tyr Phe Cys 
Ser Lys His Phe Glu Glu Asn Cys Phe Glu Leu Val Gly He Ser Gly 
Tyr His Arg Leu Lys Glu Gly Ala Val Pro llr He Phe Glu Ser Phe 
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85 

Ser Lys Leu Arg Ar<3 Thr Ala 
100 



<210> 19 

<211> 99 

<212> PRT 

<213> Sus scrofa 



90 95 



Me2°?hr 9 Ar g Ser Cys Ser Ala Val Gly Cys Ser Thr Arg Asp Thr Val 

Leu Ser Arg Glu Arg Gly Leu Ser Phe His Gin Phe Pro Thr Asp Thr 
20 25 
Arg Ser Gin Trp He Arg Ala Val Asn Arg Me 
35 40 45 



He Gin Arg Ser Gin Trp He Arg Ala Val Asn Arg Met Asp Pro Arg 

35 40 
Ser Lys Lys He Trp lie Pro Gly Pro Gly Ala Met Leu Cys Ser Lys 

cn 55 60 

His Ihe Gin Glu Ser As P Phe Glu Ser Tyr Gly He Arg Arg Lys Leu 
Lys Lys Gly Ala Val Pro Ser Val Ser Leu Tyr Lys Val Leu Gin Gly 



85 

Ala His Leu 



<210> 20 

<211> 92 

<212> PRT 

<213> Bos taurus 

<400> 20 



Met Pro Lys Ser Cys Ala Ala Arg Gin Cys Cys Asn Arg Tyr Ser Asn 

Arg Arg Lys Gin Leu Thr Phe His Arg Phe Pro Phe Ser Arg Pro Glu 

20 ^5 
Leu Leu Lys Glu Trp Val Leu Asn He Gly Arg Gly Asp Phe Glu Pro 

otr 40 . 

Lys Gin His Thr Val He Cys Ser Glu His Phe Arg Pro Glu Cys Phe 

en 55 60 

Ser Ala Phe Gly Asn Arg Lys Asn Leu Lys His Asn Ala Val Pro Thr 



50 " 60 
Arg Lys Asn Leu Lys 

65 ™ 75 

Val Phe Ala Phe Gin Gly Pro Pro Gin Leu Val Arg 

85 90 



<210> 21 

<211> 75 

<212> PRT 

<213> Bos taurus 

Ar^Le^Pro Lys Lys Asp Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn 

, 5 10 J - D 

Cys Gin Arg Leu Asp Pro Ser Gly Gin Gly Leu Trp Asp Pro Ala Ser 

20 ^5 3v 

Glu Tyr He Tyr Phe Cys Ser Lys His Phe Glu Glu Asn Cys Phe Glu 

Leu Val Gly He Ser Gly Tyr His Arg Leu Lys Glu Gly Ala Val Pro 
50 * 55 60 
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Thr He Phe Glu Ser Phe Ser Lys Leu Arg Arg 
65 70 75 



<210> 22 

<211> 91 

<212> PRT 

<213> Mus musculus 



<400> 22 



Met 


Val 


Gin 


Ser 


Cys 


Ser 


Ala 


Tyr 


1 








5 






Asp 


Lys 


Pro 


Val 


Ser 


Phe 


His 


Lys 








20 








Cys 


Lys 


Gin 
35 


Trp 


Glu 


Ala 


Ala 


Val 
40 


Lys 


Tyr 
50 


Ser 


Ser 


He 


Cys 


Ser 
55 


Glu 


Arg 


Glu 


Cys 


Asn 


Asn 


Lys 


Leu 


Leu 


65 










70 






Phe 


Leu 


Tyr 


He 


Glu 
85 


Pro 


His 


Glu 



Gly Cys 


Lys 


Asn 


Arg 


Tyr 


Asp 


Lys 




10 










15 




Phe 


Pro 


Leu 


Thr Arg 


Pro 


Ser 


Leu 


25 










30 






Lys 


Arg 


Lys 


Asn 


Phe 
45 


Lys 


Pro 


Thr 


His 


Phe 


Thr 


Pro 
60 


Asp 


Cys 


Phe 


Lys 


Lys 


Glu 


Asn 
75 


Ala 


Val 


Pro 


Thr 


He 
80 


Lys 




Glu 













90 



<210> 23 

<211> 90 

<212> PRT 

<213> Mus musculus 

<400> 23 

Met Pro Thr Asn Cys Ala Ala Ala Gly Cys Ala Ala Thr Tyr Asn Lys 

1 5 10 15 

His He Asn He Ser Phe His Arg Phe Pro Leu Asp Pro Lys Arq Arq 

20 25 30 

Lys Glu Trp Val Arg Leu Val Arg Arg Lys Asn Phe Val Pro Gly Lys 

35 40 45 

His Thr Phe Leu Cys Ser Lys His Phe Glu Ala Ser Cys Phe Asp Leu 

50 55 60 

Thr Gly Gin Thr Arg Arg Leu Lys Met Asp Ala Val Pro Thr He Phe 
65 70 75 80 

Asp Phe Cys Thr His He Lys Ser Leu Lys 
85 90 



<210> 24 

<211> 92 

<212> PRT 

<213> Mus musculus 

<400> 24 

Met Pro Lys Ser Cys Ala Ala Arg 

1 5 
Arg Arg Lys Gin Leu Thr Phe His 
20 

Leu Leu Arg Glu Trp Val Leu Asn 

35 40 
Lys Gin His Thr Val He Cys Ser 

50 55 
Ser Ala Phe Gly Asn Arg Lys Asn 
65 70 
Val Phe Ala Phe Gin Asn Pro Thr 



Gin Cys Cys Asn Arg Tyr Ser Ser 

10 15 
Arg Phe Pro Phe Ser Arg Pro Glu 
25 30 
lie Gly Arg Ala Asp Phe Lys Pro 
45 

Glu His Phe Arg Pro Glu Cys Phe 
60 

Leu Lys His Asn Ala Val Pro Thr 

75 80 
Glu Val Cys Pro 
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85 90 



<210> 25 

<211> 95 

<212> PRT 

<213> Mus musculus 

<400> 25 ^1 ^ T 

Met Val He Cys Cys Ala Ala Val Asn Cys Ser Asn Arg Gin Gly Lys 

1 5 10 15 

Gly Glu Lys Arg Ala Val Ser Phe His Arg Phe Pro Leu Lys Asp Ser 

20 25 30 

Lys Arg Leu He Gin Trp Leu Lys Ala Val Gin Arg Asp Asn Trp Thr 

35 40 45 

Pro Thr Lys Tyr Ser Phe Leu Cys Ser Glu His Phe Thr Lys Asp Ser 

50 ~ 55 60 

Phe Ser Lys Arg Leu Glu Asp Gin His Arg Leu Leu Lys Pro Thr Ala 
65 70 " 75 80 

Val Pro Ser He Phe His Leu Ser Glu Lys Lys Arg Gly Ala Gly 
85 90 95 



<210> 26 

<211> 52 

<212> PRT 

<213> Mus musculus 

<400> 26 ^ 

He Leu Gin Ala Phe Gly Ser Leu Lys Lys Gly Asp Val Leu Cys Ser 

1 5 10 15 

Arq His Phe Lys Lys Thr Asp Phe Asp Arg Ser Thr Leu Asn Thr Lys 

20 * 25 30 

Leu Lys Ala Gly Ala He Pro Ser He Phe Glu Cys Pro Tyr His Leu 

35 40 45 

Gin Glu Lys Arg 
50 



<210> 27 

<211> 103 

<212> PRT 

<213> Mus musculus 

<400> 27 w 

Met Pro Arg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 

15- 10 15 

Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 

20 ~ 25 30 

Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn Cys Gin Arg Leu Asp Pro 

35 ~ ^ 40 4 5 

Ser Gly Gin Gly Leu Trp Asp Pro Thr Ser Glu Tyr He Tyr Phe Cys 

50 55 60 

Ser Lvs His Phe Glu Glu Asn Cys Phe Glu Leu Val Gly He Ser Gly 
65 70 , 75 80 

Tvr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 

85 90 95 

Ser Lys Leu Arg Arg Thr Ala 
100 
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.<210> 28 
<211> 90 
<212> PRT 

<213> Mus musculus 
<400> 28 



Met 


Pro 


Gly 


Phe Thr 


Cys 


Cys 


Val 


Pro 


Gly 


Cys 


Tyr 


Asn 


Asn 


Ser 


His 


1 






5 










10 










15 




Arg 


Asp 


Lys 


Ala Leu 


His 


Phe 


Tyr 


Thr 


Phe 


Pro 


Lys 


Asp Ala 


Glu 


Leu 








20 








25 










30 






Arg 


Arg 


Leu 


Trp Leu 


Lys 


Asn 


Val 


Ser Arg Ala 


Gly Val 


Ser 


Gly Cys 






35 








40 










45 








Phe 


Ser 


Thr 


Phe Gin 


Pro 


Thr 


Thr Gly 


His 


Arg 


Leu 


Cys 


Ser 


Val 


His 




50 








55 










60 










Phe 


Gin 


Gly 


Gly Arg 


Lys 


Thr 


Tyr 


Thr 


Val 


Arg 


Val 


Pro 


Thr 


He 


Phe 


65 








70 










75 










80 


Pro 


Leu 


Arg 


Gly Val 


Asn 


Glu 


Arg 


Lys 


Val 















85 90 



<210> 29 

<211> 96 

<212> PRT 

<213> Mus musculus 



<400> 29 



Met 


Pro 


Asn 


Phe 


Cys 


Ala 


Ala 


Pro 


Asn Cys 


Thr 


Arg 


Lys Ser 


Thr 


Gin 


1 








5 








10 








15 




Ser 


Asp 


Leu 


Ala 


Phe 


Phe 


Arg 


Phe 


Pro Arg Asp 


Pro Ala Arg 


Cys 


Gin 








20 










25 






30 






Lys 


Trp 


Val 


Glu 


Asn 


Cys 


Arg 


Arg Ala Asp 


Leu 


Glu 


Asp Lys 


Thr 


Pro 






35 










40 








45 






Asp 


Gin 
50 


Leu 


Asn 


Lys 


His 


Tyr 
55 


Arg 


Leu Cys 


Ala 


Lys 
60 


His Phe 


Glu 


Thr 


Ser 


Met 


He 


Cys 


Arg 


Thr 


Ser 


Pro 


Tyr Arg 


Thr 


Val 


Leu Arg Asp Asn 


65 










70 








75 








80 


Ala 


He 


Pro 


Thr 


He 


Phe Asp 


Leu 


Thr Ser 


His 


Leu 


Asn Asn 


Pro 


His 



85 90 95 



<210> 30 
<211> 24 
<212> PRT 

<213> Rattus norvegicus 
<400> 30 

Met Pro Thr Asn Cys Ala Ala Ala Gly Cys Ala Ala Thr Tyr Asn Lys 

15 10 15 

His He "Asn He Ser Phe His Arg 
20 



<210> 31 
<211> 85 
<212> PRT 

<213> Rattus norvegicus 
<400> 31 

Arg Gin Cys Cys Asn Arg Tyr Ser Ser Arg Arg Lys Gin Leu Thr Phe 

1 5 10 15 

His Arg Phe Pro Phe Ser Arg Pro Glu Leu Leu Arg Glu Trp Val Leu 
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20 25 30 

Asn He Gly Arg Ala Asp Phe Lys Pro Lys Gin His Thr Val He Cys 

35 40 45 

Ser Glu His Phe Arg Pro Glu Cys Phe Ser Ala Phe Gly Asn Arg Lys 

50 55 60 

Asn Leu Lys His Asn Ala Val Pro Thr Val Phe Ala Phe Gin Asn Pro 

65 ' ™ 75 80 

Ala Gin Val Cys Pro 
85 



<210> 32 
<211> 70 
<212> PRT 

<213> Rattus norvegicus 
<400> 32 

Arq Phe Pro Leu Lys Asp Ser Lys Arg Leu He Gin Trp Leu Lys Ala 

1 5 10 15 

Val Gin Arg Asp Asn Trp Thr Pro Thr Lys Tyr Ser Phe Leu Cys Ser 

20 25 30 

Glu His Phe Thr Lys Asp Ser Phe Ser Lys Arg Leu Glu Asp Gin His 

35 40 45 

Arg Leu Leu Lys Pro Thr Ala Val Pro Ser He Phe His Leu Ser Glu 

50 55 60 

Lys Lys Arg Gly Ala Gly 
65 "70 



<210> 33 
<211> 55 
<212> PRT 

<213> Rattus norvegicus 

<400> 33 ^ 

Met Val Lys Cys Cys Ser Ala He Gly Cys Ala Ser Arg Cys Leu Pro 

1 5 10 15 

Asn Ser Lys Leu Lys Gly Leu Thr Phe His Val Phe Pro Thr Asp Glu 

20 25 30 

Asn He Lys Arg Lys Trp Val Leu Ala Met Lys Arg Leu Asp Val Asn 

35 40 45 

Thr Ala Gly He Trp Glu Pro 
50 55 



<210> 34 
<211> 103 
<212> PRT 
<213> Rattus 



norvegicus 



<400> 34 

Met Pro Arg His Cys Ser Ala Ala 

15 
Glu Thr Arg Asn Arg Gly lie Ser 
20 

Asn Pro Arg Arg Gly Leu Trp Leu 

35 40 
Ser Gly Gin Gly Leu Trp Asp Pro 

50 55 
Ser Lys His Phe Glu Glu Asn Cys 
65 70 



Gly Cys Cys Thr Arg Asp Thr Arg 

10 15 
Phe His Arg Leu Pro Lys Lys Asp 
25 30 
Ala Asn Cys Gin Arg Leu Asp Pro 
45 

Thr Ser Glu Tyr He Tyr Phe Cys 
60 

Phe Glu Leu Val Gly He Ser Gly 
75 80 
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Tyr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 
85 90 95 

Ser Lys Leu Arg Arg Thr Ala 
100 



Met 
1 


Pro 


Gly Phe 


Thr 
5 


Cys 


Cys 


Val 


Arg 


Asp 


Lys Ala 


Leu 


His 


Phe 


Tyr 






20 








Arg 


Arg 


Leu Trp 


Leu 


Lys 


Asn 


Val 






35 








40 


Phe 


Ser 


Thr Phe 


Gin 


Pro 


Thr 


Thr 




50 








55 




Phe 


Gin 


Gly Gly 


Arg 


Lys 


Thr 


Tyr 


65 








70 




Pro 


Leu Arg Gly 


Val 


Asn 


Glu 


Arg 



<210> 35 
<211> 90 
<212> PRT 

<213> Rattus norvegicus 
<400> 35 

51y Phe Thr Cys Cys Val Pro Gly Cys Tyr Asn Asn Ser His 

10 15 
Thr Phe Pro Lys Asp Ala Glu Leu 
25 ' 30 

Ser Arg Ala Gly Val Ser Gly Cys 
45 

Gly His Arg Leu Cys Ser Val His 
60 

Thr Val Arg Val Pro Thr He Phe 
75 80 

Lys Val 
85 90 

<210> 36 
<211> 96 
<212> PRT 

<213> Rattus norvegicus 
<400> 36 

Met Pro Asn Phe Cys Ala Ala Pro Asn Cys Thr Arg Lys Ser Thr Gin 

1 5 10 ~ 15 

Ser Asp Leu Ala Phe Phe Arg Phe Pro Arg Asp Pro Ala Arg Cys Gin 
_ m 20 25 30 

Lys Trp Val Glu Asn Cys Arg Arg Ala Asp Leu Glu Asp Lys Thr Pro 

35 40 45 

Asp Gin Leu Asn Lys His Tyr Arg Leu Cys Ala Lys His Phe Glu Thr 

50 55 60 

Ser Met He Cys Arg Thr Ser Pro Tyr Arg Thr Val Leu Arg Asp Asn 
65 70 75 80 

Ala He Pro Thr lie Phe Asp Leu Thr Ser His Leu Asn Asn Pro His 
85 90 95 



<210> 37 
<211> 94 
<212> PRT 

<213> Gallus gallus 
<400> 37 

Met Val He Cys Cys Ala Ala Ala 

1 5 
Ala Leu Arg Gly Ala Val Ser Phe 
20 

Lys Arg Leu He Gin Trp Leu Lys 

35 40 
Pro Thr Lys Tyr Ser Phe Leu Cys 

50 55 
Phe Ser Arg Arg Leu Glu Asp Gin 



Asn Cys Ser Asn Arg Gin Gly Lys 

10 15 
His Arg Phe Pro Leu Lys Asp Ser 
25 30 
Ala Val Gin Arg Asp Asn Trp Thr 
45 

Ser Glu His Phe Thr Lys Asp Ser 
60 

His Arg Leu Leu Lys Pro Thr Ala 
20/95 
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65 70 75 80 

Val Pro Thr He Phe Gin Leu Ala Glu Lys Lys Arg Asp Asn 
85 90 



<210> 38 
<211> 94 
<212> PRT 

<213> Gallus gallus 

<400> 38 w „ ^ 

Met Pro Arg Tyr Cys Ala Ala Ser Tyr Cys Lys Asn Arg Gly Gly Gin 

1 5 10 15 . 

Ser Ala Arg Asp Gin Arg Lys Leu Ser Phe Tyr Pro Phe Pro Leu His 

20 25 30 

Asp Lys Glu Arg Leu Glu Lys Trp Leu Arg Asn Met Lys Arg Asp Ala 

35 40 45 

Trp Thr Pro Ser Lys His Gin Leu Leu Cys Ser Asp His Phe Thr Pro 

50 55 60 

Asp Ser Leu Asp Val Arg Trp Gly He Arg Tyr Leu Lys Hxs Thr Ala 
65 ~ 70 75 80 

Val Pro Thr He Phe Ser Ser Pro Asp Asp Glu Glu Lys Gly 
85 90 



<210> 39 
<211> 102 
<212> PRT 

<213> Gallus gallus 

<400> 39 mw 

Met Pro Arg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 

1 5 10 15 

Glu Thr Arg Ser Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 

20 25 30 

Asn Pro Arg Arg Ala Leu Trp Leu Glu Asn Ser Arg Arg Arg Asp Ala 

35 ~ 40 45 

Ser Gly Glu Gly Arg Trp Asp Pro Ala Ser Lys Tyr He Tyr Phe Cys 

50 ~ 55 60 

Ser Gin His Phe Glu Lys Ser Cys Phe Glu He Val Gly Phe Ser Gly 
65- 70 75 80 

Tvr His Arg Leu Lys Glu Gly Ala Val Pro Thr Val Phe Glu Ser Thr 

85 90 95 

Ser Pro Arg Pro Pro Arg 
100 



<210> 40 
<211> 27 
<212> PRT 

<213> Gallus gallus 
<400> 40 

Met Thr Arg Ser Cys Ser Ala Leu 

1 5 
Arg Ser Arg Glu Arg Gly He Ser 
20 



Gly Cys Ser Ala Arg Asp Asn Gly 

10 15 
Phe His Gin 
25 



<210> 41 
<211> 90 
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<212> PRT 

<213> Xenopus laevi 



<400> 41 












Met 


Val 


Gin 


Ser 




Ocl 


TV 1 -j 


Tyr 


1 








c 
O 






-^i-— > 




Pro 


He 


Ser 


Phe 


His 


Lys 








20 








Cys 


Lys 


Lys 
35 


Trp 


Glu 


Ala 


Ala 


Val 
40 


Lys 


Tyr 


Ser 


Ser 


He 


Cys 


Ser 


Asp 




50 










55 


Arg 


Glu 


Cys Asn 


Asn 


Lys 


Leu 


Leu 


65 










70 






Phe 


Ala 


Leu 


Ala 


Glu 
85 


He 


Lys 


Lys 



Gly Cys Lys Asn Arg Tyr Asp Lys 

10 " 15 

Phe Pro Leu Lys Arg Pro Leu Leu 
25 30 
Arg Arg Ala Asp Phe Lys Pro Thr 
45 

His Phe Thr Ala Asp Cys Phe Lys 
60 

Lys Asp Asn Ala Val Pro Thr Val 
75 80 

Lys Met 
90 



<210> 42 
<211> 103 
<212> PRT 

<213> Xenopus laevi 



<400> 42 


























Met 


Pro 


Arg 


His 


Cys 


Ser 


Ala 


Leu 


Gly 


Cys 


Thr 


Thr Arg 


Asp 


Ser 


Arg 


1 








5 










10 






15 


Gin 


Thr 


Arg 


Asn 


Asn 


Asn 


He 


Ser 


Phe 


His 


Arg 


Leu Pro 


Arg 


Lys 


Asp 








20 










25 








30 


Asp 


Pro 


Arg 


Arg 


Asn 


Leu 


Trp 


He 


Ala 


Asn 


Cys 


Gin Arg 


Thr Asp 


Pro 






35 










40 








45 








Ser 


Gly 


Lys 


Gly 


Leu 


Trp Asp 


Pro 


Ser 


Ser Asp Tyr Val 


Tyr 


Phe 


Cys 




50 










55 










60 




Ser 


Lys 


His 


Phe 


Glu 


Lys 


Ser 


Cys 


Phe 


Glu 


Val 


Val Gly 


Thr 


Ser 


Gly 


65 










70 










75 






80 


Tyr 


His 


Arg 


Leu 


Lys 
85 


Glu 


Asp 


Ala 


Val 


Pro 
90 


Thr 


Leu Phe 


Leu 


Ser 
95 


Ser 


Ala 


Lys 


Leu 


Arg 


Arg 


Ala 


Ala 



















100 



<210> 43 
<211> 90 
<212> PRT 

<213> Xenopus laevi 
<400> 43 

Met Val Arg Ser Cys Ser Ala Ala 

1 5 
Leu Asn Lys Arg Lys Gly He Thr 
20 

Ala Arg Arg Gin Leu Trp He Thr 

35 40 
Val Gly Thr Asp Trp Thr Pro Ser 

50 55 
His Phe Asn Asn Thr Gin Phe Asp 
65 70 
Arg Asp Ser Ala Val Pro Thr Val 
85 



Asn Cys Val Asn Arg Gin Thr Ala 

10 15 
Phe His Arg Phe Pro Lys Glu Gin 
25 30 
Ala Val Thr His Ser His Ala Ala 
45 

He His Ser Ser Leu Cys Ser Gin 
60 

Arg Thr Gly Gin Thr Val Arg Leu 
75 80 

Phe Ser 
90 



<210> 44 
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<211> 99 
<212> PRT 
<213> Xenopus laevi 



<400> 44 



Met 


Pro 


Val 


Ser 


Cys 


Ala 


Ala 


Ser 


Gly Cys 


Lys 


Ser 


Arg 


Tyr 


Thr 


Met 


1 








5 








10 










15 




Asp 


Ala 


Arg 


Glu 


Lys 


Gly 


He 


Thr 


Phe His 


Arg 


Phe 


Pro 


Arg 


Ser 


Asn 




20 










25 








30 






Pro 


Thr 


Leu 


Leu 


Glu 


Lys 


Trp 


Arg 


Leu Ala Met Arg Arg 


Ser 


Thr 


Arg 






35 










40 








45 








Asn 


Gly 
50 


Glu 


Leu 


Trp 


Met 


Pro 
55 


Ser 


Arg Tyr 


Gin 


Arg 
60 


Leu 


Cys 


Ser 


Leu 


His 


Phe 


Lys 


Gin 


Cys 


Cys 


Phe 


Asp 


Thr Thr 


Gly 


Gin 


Thr 


Lys 


Arg 


Leu 


65 






70 








75 










80 


Arg 


Glu Asp Val 


He 


Pro 


Thr 


He 


Phe Asp 


Phe 


Pro 


Glu 


Glu 


Thr 


His 








85 








90 










95 




Val 


He 


Phe 



























<210> 45 
<211> 90 
<212> PRT 

<213> Xenopus laevi 
<400> 45 

Met Pro Ala Cys Ala Ala He Asn 

1 5 
Cys Gly Lys Ser Phe His Lys Phe 
20 

Lys Lys Trp Val Met Asn Met Arg 

35 40 
Lys Ala Val Leu Cys Ser Asp His 

50 55 
Thr Gly Gin Thr He Arg Leu Arg 
65 "* 70 
Thr Phe Pro Gly Lys Met Lys Lys 
85 



Cys Thr Ser Arg Gin Thr Arg Gly 

10 15 
Pro His Gly Arg Pro Glu Val Leu 
25 30 
Arg Asp Lys Phe Lys Pro Ser Ser 
4 5 

Phe Glu Glu Phe Cys Phe Asp Arg 
60 

Thr Asp Ala Val Pro Thr Val Phe 
75 80 

Asp Arg 
90 



<210> 46 
<211> 105 
<212> PRT 

<213> Xenopus laevi 
<400> 46 



Met 


Pro His 


Cys 


Val 


Val 


Ser 


Asn 


Cys Val His 


Phe 


Asn Tyr 


Lys 


Lys 


1 




5 








10 






15 




Ser 


Asn Leu 


His 


Gly Val Ala 


Leu 


His Pro Phe 


Pro Asn Asp 


Leu 


Ser 






20 










25 




30 






Arg 


He Lys 


Leu 


Trp 


Leu 


Gin 


Gin 


He Gly Leu 


Thr 


Thr Asp Glu 


He 


35 










40 






45 






Asp 


Tyr Leu 


Ala 


Gin 


Lys 


Val 


Val 


Glu Gly Lys 


Arg 


Lys Lys 


Thr Asp 


50 








55 






60 








Ser 


His Arg Met 


Cys 


Ser 


Ala 


His 


Phe Thr Pro 


Asn 


Cys Tyr 


He 


Val 


65 








70 






75 








80 


Gin 


Asp Ala 


Lys 


Leu 


Val 


Leu Arg 


Ser Asp Ala 


He 


Pro Thr 


Met 


Phe 






85 








90 






95 




Pro 


Gly Leu 


Ser 


Ser 


Ser 


Thr 


Thr 


Asn 












100 










105 
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<210> 47 
<211> 104 
<212> PRT 

<213> Xenopus laevi 
<400> 47 



11 tr u 


TD -y 


Lys 


Cys 


lie 


vai 


rnr 


Lys 


Cys Pro 


His 


Lys 


Thr 


Gly 


Gin 


Lys 


1 








5 








10 








15 


Glu 


Leu 


Tyr 


Pro 


Ser 


Val 


He 


Leu 


His Pro 


Phe 


Pro 


Gly Asn 


He 


Glu 








20 










25 








30 






Lys 


He 


Lys 


Gin 


Trp 


Leu 


Leu 


Gin 


Thr Gly 


Glu 


Asp 


Tyr 


Gly Asp Tyr 






35 










40 








45 








Glu 


Val 


Phe 


Ala 


Glu 


Lys 


Val 


Leu 


Glu Ala 


Lys 


Lys' 


Thr 


Asp 


Ala 


Tyr 




50 










55 








60 






Arg 


He 


Cys 


Ser 


Arg 


His 


Phe 


Ala 


Glu Asp 


Gin 


Tyr Val 


Lys 


Arg 


Gly 


65 










70 








75 






80 


Pro 


Arg 


Lys 


Leu 


Leu 


Ser 


Lys 


Asp Ala Val 


Pro 


Thr 


He 


Phe 


Ser 


Asn 










85 








90 










95 




Leu 


His 


Pro 


Leu 
100 


He 


Gin 


Leu 


His 

















<210> 48 
<211> 102 
<212> PRT 

<213> Xenopus laevi 
<400> 48 



Met 


Pro 


Arg 


Cys 


Val 


Val 


Lys 


Asn 


1 








5 








Gly 


Ser 


Gin 


Val 


He 


Leu 


His 


Gly 








20 








Lys 


Leu 


Trp 

35 


Leu 


Ser 


Gin 


Thr 


Lys 
40 


Phe 


Thr 


Gin 


Lys 


He 


Leu 


Glu 


Gly 




50 










55 


Cys 


Ser 


Lys 


His 


Phe 


Thr 


Asn 


Asp 


65 










70 




Arg 


Phe 


Leu 


Lys 


Tyr Gly 


Ala 


Val 










85 








Pro 


Leu 


Lys 


Arg 
100 


Arg 


Lys 







Cys Pro His Trp Thr Gly Lys Lys 

10 15 
Phe Pro Asn Asn Ser Arg Leu He 
25 30 
Gin Asp Phe Gly Asp Val Glu Asp 
45 

Lys Lys Asn Asp Leu Tyr Arg Leu 
60 

Ser Tyr Glu He Arg Gly Thr Lys 

75 80 
Pro Thr Val Phe Glu Asp Thr Pro 
90 95 



<210> 49 
<211> 104 
<212> PRT 

<213> Xenopus laevi 
<400> 49 

Met Pro Asn Cys He Val Lys Asp 

1 5 
He Gin Asn Pro Asp Val Val Leu 
20 

Met He Lys Asn Trp Leu Leu Gin 

35 40 
Asp Val Leu Ala Asp Lys He Leu 

50 55 
Arg Met Cys Ser Cys His Phe Thr 



Cys Arg His Lys Ser Gly Gin Lys 

10 15 
His Pro Phe Pro Asn Asn He Asn 
25 30 
Thr Gly Gin Asp Phe Gly Asp He 
45 

Lys Gly Lys Lys Thr Ala Asn Phe 
60 

Arg Asp Ser Tyr Met Ala Arg Gly 
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65 70 75 80 

Ser Lys Thr Thr Leu Lys Pro Asn Ala He Pro Thr He Phe Pro Val 
85 ' 90 95 

He Leu Pro Thr Thr Val Pro Ser 
100 



<210> 50 
<211> 99 
<212> PRT 

<213> Xenopus laevi 
<400> 50 

Met Pro Lys Cys Phe Val Gin Ser Cys Pro His Tyr Thr Gly Arg Asn 

1 4 5 10 15 

Glv Lys Pro Asp Asn Val He Leu His Thr Phe Pro Arg Cys Lys Lys 

20 25 30 

Gin Val Gin Val Trp Leu Ser Arg Thr Gly Glu Arg Tyr Glu Asn Met 

35 40 45 

Ala Glu Phe Val Thr Tyr He Thr Gin Arg Cys Ser Asn Phe Arg Met 

50 55 60 

Cys Ser Glu His Phe Thr Asp Asp Cys Tyr He Thr Val Glu Gly Lys 
65 70 ' 75 80 

Arq Arg Leu Met Glu Asn Ser Ala Pro Thr He Phe Lys Thr Thr Phe 
85 90 95 

Arg Gin Asn 



<210> 51 
<211> 104 
<212> PRT 

<213> Xenopus laevi 
<400> 51 



Met 


Thr 


Lys 


Cys 


He 


Val 


Lys 


Gly 


Cys Arg His 


Thr Thr Gly 


Gin 


Lys 


1 




5 








10 




15 




Leu 


Lys 


Phe 


Pro 


His 


He 


Val 


Met 


His Ala Phe 


Pro Ser Asn 


Leu 


Lys 






20 










25 


30 






Met 


He 


Lys 


Val 


Trp 


Leu 


Lys 


Gin 


Thr Gly Gin 


Tyr Gly Asn 


Asn 


Leu 






35 








40 




45 






Glu 


Glu 


Met 


Ala 


Leu Lys 


Val 


Leu Gly Gly Lys 


Lys Ser Asp 


Ser 


Tyr 




50 










55 






60 






Arg 


Leu 


Cys 


Ser 


Ala 


His 


Phe 


Thr 


Val Asp Ser 


Tyr Ala Leu 


Arg 


Arg 


65 








70 






75 






80 


Ser 


Lys 


Asn 


Met 


Leu 


Lys 


Lys 


Asp 


Ala Phe Pro 


Thr Leu Phe 


Gly Gin 








85 








90 




95 




Asn 


Gin 


He 


Asn 


Ala 


Ala 


Asn 


Val 











100 



<210> 52 
<211> 84 
<212> PRT 

<213> Xenopus laevi 
<400> 52 

Met Pro Lys Cys He Val He His 

1 5 
Val Thr Lys Asn Thr Gly Val Val 
20 



Cys Pro His Ser Cys Ser Lys Lys 

10 15 
Met His Thr Phe Pro Phe Asn Leu 
25 30 
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Asp Arg lie Lys Asn Trp Leu Leu Ser lie Asp Gin Asn Phe Gly Asn 
35 40 45 

He Asp Thr Leu Ala Asn Arg He Leu Glu Glu Lys Lys Lys His Ser 
n 55 60 

Asp Leu Tyr Arg Leu Cys Ser Glu His Phe Thr Pro Gin Cys Tyr He 

70 75 " on 

Ser Thr Gly Glu yU 



<210> 53 
<211> 104 
<212> PRT 

<213> Xenopus laevi 
<400> 53 

Met Pro Ser Cys lie Val Lys Gly Cys Pro His Arg Thr Gly Gin Lys 

5 10 " 15 

Asp Lys Phe Pro Asn Val Thr Leu His Asn Phe Pro Lys Thr He Pro 

20 25 30 

Lys He Lys Asn Trp Leu Trp Gin Thr Gly Gin Tyr Gly Glu Asp Ser 

Asp Ala He Ala Glu Glu lie Leu Gin Gly Leu Lys Thr Cys Arg His 

55 60 
Arg Met Cys Ser Met His Phe Ser Glu Asn Cys Phe He Thr Leu Gly 

Ser Lys Arg Val Leu Thr Arg Asn Ala Val Pro Thr He Phe Lys Pro 

85 90 at 

Gin Thr Thr Pro Ala He Leu Ala 
100 

<210> 54 
<211> 104 
<212> PRT 

<213> Xenopus laevi 
<400> 54 

Met Pro Lys Cys lie Leu Asn Gly Cys Pro Tyr Arg Thr Gly Gin Lys 

Leu Lys Phe Pro Asp He Val Leu His Pro Phe Pro Lys Ser Met Glu 

20 25 30 

Met lie Arg Asn Trp Leu Phe Gin Thr Gly Gin His Ala Glu Asp Val 

J5 40 45 

Glu Ser Leu Ser Gin Arg He Tyr Gin Gly Leu Lys Thr Ser Asn Phe 

^ u 55 60 



Arg Met Cys Ser Lys His Phe Thr Gin Asp Cys Tyr Met Gin Val Gly 

od "70 75 

Ser Arg Lys Cys Leu Lys Pro Asn Ala Val Pro Thr Val Phe Glu Ser 

85 90 QC 

Tyr Asn Val Pro Val Thr Thr Phe 
100 



<210> 55 
<211> 105 
<212> PRT 

<213> Xenopus laevi 
<400> 55 

Asn Asn Ala Ser Cys lie Val Arg Gly Cys His His Ser Thr Ala Arg 

26/95 
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Lys cys Leu Ser Pro Gly He Ala^eu His Gly Phe Pro Asn Asn Leu 
Ser Arg He lys Gin Trp Leu Val Asn He Gly Gin Asn Val Gly Asp 
lie Asp Asp Phe Ala Gin Lys Val Leu Asp Gly Lys Lys Gin Asn Ser 
Tyr Arg He Cys Ser Ala His Phe Ser Ser Asp Cys Phe Val Gin Phe 
Gly Tyr Ser Lys Gly Leu Lys Ala Asp Ala Val Pro Thr He Phe Ala 



85 90 

100 



Trp Asn Thr Pro Glu Ser Arg Gly Arg 



<210> 56 
<211> 107 
<212> PRT 

<213> Xenopus laevi 



Met°Pro 6 Ser Cys He Val Lys Gly Cys Arg His Lys Ser Gly Gin Lys 

Val Leu Tyr Pro Asp Val Val Leu His Ser Phe Pro Asn Asn He His 

Met lie Lys Asn Trp Leu Leu Gin Thr Gly Gin Val Phe Gly Asp He 

Asp Ala Phe Ala Glu Lys Val Leu Lys Gly Asn Lys Thr Ser Ala Phe 

Arg Met Cys Ser Arg His Phe Thr Arg Asp Ser Tyr Met Ala Lys Gly 

Kr Lys He Thr Leu Lys Pro Asn Ala Val Pro Thr He Phe Asn Thr 
8 5 

Leu Pro Pro Ala Ala Ala Val Pro Ser Leu Met 
100 105 



<210> 57 

<211> 91 

<212> PRT 

<213> Danio rerio 

Met°val 7 Gln Ser Cys Ser Ala Tyr Gly Cys Asn Asn Arg Tyr Gin Lys 



Asp Arg He He Ser Phe His Lys Phe Pro Leu Ala Arg Pro Glu Val 

Cys Val Gin Trp Val Ser Ala Met Ser Arg Arg Asn Phe Lys Pro T*r 

oc 40 
Lys Tyr Ser Asn He Cys Ser Gin His Phe Thr Ser Asp Cys Phe Lys 



Gin Glu Cys Asn Asn Arg Val Leu Lys Asp Asn Ala Val Pro Ser Leu 

lie Thr Leu Gin Thr Gin Asp Pro Phe Ser Ala 
85 90 



<210> 58 

<211> 103 

<212> PRT 

<213> Danio rerio 
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<400> 58 












Met 


Prn 


Arg 


His 


^ys oer /iia 


val 


Gly 


Cys 


1 








O 




10 


Asp 


Val 


Arg 


Lys 
20 


oer uiy ±i_e 


Thr 


Phe 

O IT 

25 


His 


Asn 


Pro 


Arg Arg 


Thr Thr Trp 


He 


lie 


Asn 






35 




40 






O _L U. 


vj-Ly 


Lys 


Gly 


Gin Trp Asp 


Pro 


Gin 


Ser 




50 






55 






Ser 


Lys 


His 


Phe 


Thr Pro Asp 


Ser 


Phe 


Glu 


65 








70 








Tyr 


His 


Arg 


Leu 


Lys Asp Asp Ala 


He 


Pro 




His 






85 






90 


Pro 


Lys 


Lys 
100 


Gly Thr Ala 









15 

Arg Leu Pro Lys Lys Gl' 
30 

Ser Arg Arg Lys Gly Pr< 
45 

Gly Phe He Tyr Phe Cy; 
60 

Leu Ser Gly Val Ser Gl^ 
75 80" 
Thr Val Phe Glu He Gli 
95 



<210> 59 

<211> 90 

<212> PRT 

<213> Danio rerio 



<400> 59 






Met 
1 


Pro Gly Phe 


Thr Cys Cys 
5 


Val 


Arg 


Asp Arg Asp 


Leu Arg Phe 


Tyr 




20 




Arg 


Glu He Trp 


Leu Lys Asn 


He 




35 




40 


Phe 


Ser Thr Phe 


Gin Pro Thr 


Thr 




50 


55 




Phe 


Pro Gly Gly 


Arg Lys Thr 


Tyr 


65 




70 


Pro 


Leu Arg Gly 


Val Asn Glu Arg 



85 



Pro Gly Cys Tyr Asn Asn Ser His 

10 15 
Thr Phe Pro Lys Asp Pro Thr Gin 
25 30 
Ser Arg Ala Gly Val Ser Gly Cys 
45 

Gly His Arg Val Cys Ser Val His 
60 

Thr He Arg Val Pro Thr Leu Phe 
75 80 

Arg Ser 
90 



<210> 60 

<211> 96 

<212> PRT 

<213> Danio rerio 



<400> 60 

Met Pro Asn Phe Cys Ala Ala Leu Asn Cys Ser Arg Asn Ser Thr His 

1 5 10 15 

Ser Val Leu Ala Phe Phe Arg Phe Pro Arg Asp Pro Glu Arg Cys Lys 

20 25 30 

Lys Trp Val Glu Asn Cys Ser Arg Ser Asp Leu Lys Asp Lys Thr Pro 

35 40 45 

Asp His Leu Asn Lys Tyr His Arg Leu Cys Ala Arg His Phe Glu Pro 

50 55 60 

Asn Leu He Thr Lys Thr Ser Pro Phe Arg Thr Val Leu Lys Asp Ser 
65 70 75 80 

Ala Val Pro Thr He Phe Asp Asn Pro Phe Lys Arg Ser Asn Asn Glu 
• 85 90 95 



<210> 61 
<211> 99 
<212> PRT 
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<213> Danio rerio m 

Me^Pro^Tyr Lys Cys Val Ala Tyr Gly Cys Gly Lys lie Ser Gly Gin 

1 5 10 15 

Asn Val Ser Met Phe Arg Phe Pro Lys Asp Pro Glu Glu Phe Ser Lys 

20 " 25 30 

Trp Gin Arg Gin Val Gin Lys Thr Arg Arg Asn Trp Leu Ala Asn Thr 

35 40 45 

Tyr Ser His Leu Cys Asn Glu His Phe Thr Lys Asp Cys Phe Glu Pro 

50 55 60 

Lys Thr Tyr Val Thr Ala Lys Ala Ser Gly Phe Lys Arg Leu Lys Leu 
65 "70 "75 80 

Lys Asp Gly Ala Val Pro Thr Val Phe He Arg Arg Arg Cys Arg Lys 
85 90 95 

Cys Gly Gly 



<210> 62 

<211> 90 

<212> PRT 

<213> Danio rerio 

Met°Gly 2 Gly Cys Ser Ala Pro Asn Cys Ser Asn Ser Thr Thr He Gly 

1 J 2 5 10 15 

Lys Gin Leu Phe Arg Phe Pro Lys Asp Pro Val Arg Met Arg Lys Trp 

20 25 30 

Leu Val Asn Cys Arg Arg Asp Phe Val Pro Thr Pro Cys Ser Arg Leu 

35 ~~ 40 45 

Cys Gin Asp His Phe Glu Glu Ser Gin Phe Glu Glu He Ala Arg Ser 

50 * 55 60 

Pro Ala Gly Gly Arg Lys Leu Lys Pro Asn Ala He Pro Thr Leu Phe 
65 70 75 80 

Asn Val Pro Asp Pro Pro Ser Pro Val Thr 
85 90 



<210> 63 

<211> 105 

<212> PRT 

<213> Danio rerio 

<400> 63 T 

Met Val Leu Asn Cys Ala Tyr Pro Gly Cys Leu Asn Leu Phe Lys Lys 

! 5 10 15 

Glu Arg Leu Arg Ser Asn Ser Ser Ser His Gly Gly Lys Leu Thr Phe 

20 25 30 

His Arg Phe Pro Thr Leu Glu Pro Gly Arg Leu Leu Leu Trp Arg Ala 

35 40 45 

Ala Leu Gly Met Asp Pro Asp Thr Pro Met Arg Ser Leu Arg Val Trp 

50 .55 ' 60 

Arg He Cys Ser Glu His Phe Ser Pro Glu Asp Phe Arg Ala Val Asn 
65 * 70 75 80 

Gly Asn Lys Val Leu Leu Lys Ala Ser Ala Val Pro Arg Val Tyr Ser 

85 90 95 

Thr Pro Ala Pro Gly Ser Arg Ala Asp 
100 105 
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<210> 64 

<211> 99 
<212> PRT 
<213> Danio rerio 



<400> 64 



jyie u 




Ser Ser 


Arg Arg 


Cys 


Tyr 


Cys 


Ser 


Val Pro Gly Cys 


Ser 


Asn 


j. 






5 










10 


15 




Ser 


Lys 


Lys Arg 


His 


Pro 


Tyr 


Leu 


Ser 


Phe 


His Asp Phe Pro 


Lys 


Asp 






20 










25 




30 


Glu 


Gly 


Gin Arg 


Lys 


Ser 


Trp 


Val 


Lys 


Phe 


He Arg Arg Glu 


Glu Gly 






35 








40 






45 






Pro 


Phe 
50 


Phe Gin 


He 


Lys 


Arg 
55 


Gly 


Ser 


Thr 


Phe Val Cys Ser 
60 


Met 


His 


Phe 


Lys 


Ala Asp 


Asp 


He 


Tyr 


Thr 


Thr 


He 


Ser Gly Arg Arg 


Lys 


He 


65 








70 










75 


80 


Asn 


Pro 


Gly Ala 


Ala 


Pro 


Arg 


Leu 


Phe 


Ser 


Trp Asn Asn Trp 


Ser 


Thr 








85 










90 


95 




Asp 


Lys 


Val 





















<210> 65 

<211> 66 

<212> PRT 

<213> Danio rerio 



<400> 65 

Phe Pro Lys Glu Asn Val Leu Arg 

1 5 
Arg Lys Gly Phe Ser Ala Ser Glu 
20 

Phe Arg Pro Gin Asp Leu Asp Arg 

35 "* 40 

Asp Gly Ala Lys Pro Ser Val Phe 

50 55 
His Val 
65 



Lys Gin Trp Glu He Ala Leu Lys 

10 15 
Ser Ser Val Leu Cys Ser Glu His 
25 30 
Thr Gly Gin Thr Val Arg Val Arg 
45 

Ser Phe Pro Ala His Met Gin Lys 
60 



<210> 66 

<211> 93 

<212> PRT 

<213> Danio rerio 



<400> 66 










Ser Ser Glu 


His 


Cys 


Cys Val 


Pro 


1 




5 






Asn Ser Ala 


Val 


Ser 


Phe His 


Thr 




20 








Glu Lys Trp 


He 


Lys 


Asn He 


Arg 


35 








40 


His Thr Arg Val 


Cys 


Cys Arg 


His 


50 






55 




Pro Arg Asn 


Pro 


He 


Gly Arg 


Arg 


65 






70 




Thr Leu Phe 


Lys 


Trp Asn Gly 


Tyr 



85 



Leu Cys Gly Ala Ser Ser Arg Phe 

10 15 
Phe Pro Val Ser Thr Glu He Arg 
25 30 
Arg Glu Lys Leu Asn He Thr Tyr 
45 

Phe Thr Thr Asp Asp Leu He Gin 
60 

Leu Leu Arg Lys Gly Ala Val Pro 

75 80 
Ser Asp Ala Glu Ala 
90 



<210> 67 
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<211> 93 
<212> PRT 
<213> Danio rerio 

<400> 67 

Met Pro Asp Phe Cys Ala Ala Tyr 

1 5 
Lys Leu Lys Asp Lys Gly He Thr 
20 

Lys Arg Arg Gin Ala Trp Thr Leu 

35 40 
Pro Lys Pro Arg Ser Leu Leu Cys 

50 55 
Phe Asp Arg Thr Gly Gin Thr Val 
65 70 
Ser He Phe Asn Phe Ser Asn Pro 
85 



Gly Cys Ser Asn Glu Arg Thr Lys 

10 15 
Phe His Arg Phe Pro Arg Asp Val 
25 30 
Ala Leu Arg Arg Asp Lys Phe Glu 
45 

Ser Cys His Phe Arg Pro Glu Asp 
60 

Arg Leu Arg Asp Gly Val He Pro 

75 80 
Leu Ser Lys Leu Ser 
90 



<210> 68 

<211> 97 

<212> PRT 

<213> Danio rerio 



<400> 68 



Met 


Pro 


Val 


Cys 


Ser 


Ala 


Tyr 


Lys 


1 








5 








Tyr 


Lys 


Glu 


Ala 
20 


Tyr 


Lys 


Arg 


Gly 


Leu 


Glu 


Asp 


Gly 


Leu 


Arg 


Val 


Arg 






35 










40 


Gin 


Asn 


Trp 


Trp 


Pro 


Thr 


Gly 


Asn 




50 










55 




Glu 


Lys 


Asp 


Cys 


Phe 


Glu 


Gin 


Val 


65 










70 






Ser 


Ala 


Val 


Pro 


Thr 


He 


Phe 


Asn 










85 








Val 

















Cys Lys Lys. Arg Ser Asp Arg Glu 

10 ** 15 
Glu Phe Ser Phe His Lys Phe Pro 
25 30 
Glu Trp Leu Arg Arg Met Arg Trp 
4 5 

Ser Val Leu Cys Ser Asp His Phe 
60 

Gly Ser His Lys Arg Leu Arg Lys 

75 80 
Phe Pro Lys His Leu Gin Trp Lys 
90 95 



<210> 69 

<211> 90 

<212> PRT 

<213> Danio rerio 

<400> 69 

Met Val Leu Val Cys Ser Ala Tyr 

1 5 
Lys Ser Val Ser Phe His Leu Phe 
20 

Lys Lys Trp Leu Lys Asn Leu Arg 

35 40 
Asn Ser Lys He Cys Ser Ala His 

50 55 
Glu Gly Lys Lys Thr Arg Leu His 
65 ^ " 70 
Ser Phe Pro Asn Arg Phe Ser Glu 
85 



Asn Cys Lys Asn Thr Leu Arg Asn 

10 15 
Pro Leu Lys Asp Pro Ser Leu Leu 
25 30 
Trp Lys Asp Trp Lys Pro Asn Pro 
45 

Phe Glu Glu Lys Cys Phe He Leu 
60 

Thr Trp Ala Val Pro Thr He Phe 
75 80 

Arg Asn 

. 90 
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<210> 70 

<211> 107 
<212> PRT 
<213> Danio rerio 

<400> 70 



Met Asn Ser 


lie 


Ser 


Leu 


Lys 


Tyr 


Leu 


Arg 


Arg 


Glu 


Cys 


Ala 


Tyr 


Ser 


1 




5 










10 










15 




Arg Tyr Cys 


Cys 


Val 


Pro 


Phe 


Cys 


Lys 


He 


Ser 


Ser Arg 


Phe 


Asn 


Ser 




20 










25 










30 






Val He Ser 


Phe 


His 


Lys 


Leu 


Pro 


Leu 


Asp 


Arg 


Ala 


Thr Arg 


Lys 


Met 


35 










40 










45 






Trp Leu His 


Asn 


He 


Arg 


Arg 


Lys 


Thr 


Phe 


Glu 


Val 


Ser 


Pro 


His 


Val 


50 








55 










60 










Arg Val Cys 


Ser 


Arg 


His 


Phe 






■flop 


Asp 


Phe 


lie 


Glu 


Pro 


Ser 


65 






70 










75 










80 


Tyr Pro Thr 


Ala 


Arg Arg 


Leu 


Leu 


Lys 


Lys 


Gly Ala Val 


Pro 


Thr 


Leu 






85 










90 










95 




Phe Arg Trp 


Asn 


Asn Asp 


Ser 


Thr 


Ser 


Gly 


Gin 














100 










105 
















<210> 71 




























<211> 89 




























<212> PRT 




























<213> Danio 


rerio 
























<400> 71 




























Leu Arg Leu 


Arg 


Gin 


Ser 


Ala 


Ser 


Ser 


His 


Glu 


Glu 


Ser 


Leu 


Thr 


Phe 


1 




5 










10 










15 




Tyr Ser Leu 


Pro 


Leu 


Gin Asp 


Phe 


Lys 


Arg 


Leu 


Asn 


Leu 


Trp 


Leu 


Asn 




20 










25 










30 






Ala Val Arg 


Arg Asp 


Thr 


Lys 


Ser 


Ser 


He 


Arg Asn 


He 


Arg 


Gly Leu 


35 










40 










45 






Arcr V;=3 1 fvs 


Ser 


Glu 


His 


Phe 


Ala 






Asp 


Phe 


Ser 


Leu 


Asn Arg 


50 








55 










60 










Gly Ser Lys 


Arg 


Arg 


Leu 


Lys 


Ser 


Thr 


Ala 


Val 


Pro 


Lys 


Cys 


Asn 


Glu 


65 






70 










75 






80 


Ala Leu Pro 


Gin 


He 


Arg Arg Ala 


Gly 




















85 
























<210> 72 




























<211> 105 




























<212> PRT 




























<213> Danio 


rerio 
























<400> 72 




























Met Val lie 


Thr 


Cys 


Ala 


Cys 


Pro 


Gly 


Cys 


Asp Asn Arg 


Tyr 


Lys 


Thr 


1 




5 










10 










15 




Leu Arg Leu 


Arg 


Ser Asp 


Ser 


Lys 


Phe 


His 


Pro 


Gly 


Lys 


Leu 


Thr 


Phe 




20 










25 










30 






His Lys Phe 


Pro 


Thr 


Ser Asp 


Pro 


Glu 


Arg 


Leu 


Lys 


Leu 


Trp 


Leu 


Leu 


35 










40 










45 








Ala Leu Gly 


Leu 


Asp 


He 


Asn 


Thr 


Pro 


Leu 


Ser 


Val 


Leu 


Glu 


Thr Arc? 


50 








55 










60 










Arg He Cys 


Ser 


Asp 


His 


Phe 


Ser 


Pro 


Phe 


Asp 


Phe 


Lys 


Asp 


Thr 


Lys 


65 






70 










75 










80 


Gly Ser He 


Val 


Gin 


Leu 


Lys 


Ser 


Trp 


Ala 


Val 


Pro 


Met 


Asn 


Leu 


Ser 






85 










90 










95 




Glu Gin Phe 


Val 


Asp Asp 


Pro 


Ser 


Lys 
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100 105 



<210> 73 

<211> 96 

<212> PRT 

<213> Danio rerio 

Met°Pro 3 Asp Cys Cys Ala Ala Ala Asn Cys Lys Gin Ser Thr Asp Gin 

1 5 10 1 

Ser Ser Val Ser Phe Phe Glu Phe Pro Leu Asp Pro Asp Arg Cys Arg 

20 25 30 

Gin Trp Val Gly Arg Cys Asn Arg Pro Asp Leu Gin Thr Lys Thr Pro 

35 ' 40 45 

Glu Asp Leu His Lys Asn Tyr Lys Val Cys Ser Arg His Phe Glu Thr 

50 55 60 

Ser Met He Cys Gin Gin Ser Ala Val Lys Cys He Leu Lys Asp Asp 
g5 70 75 yu 

Ala Val Pro Thr Leu Phe Asn Phe Ser Thr Asn Gin Asp Asn Ala Gin 
85 90 95 



<210> 74 

<211> 91 

<212> PRT 

<213> Danio rerio 

Met°val 4 Lys Cys Thr Val Gin Gly Cys He Asn Phe Ser Asp Leu Arg 

1 5 10 15 

Pro Glu Glu Gin Pro Asn Arg Pro Arg Lys Arg Phe Phe Arg Phe Pro 

20 25 30 

Lvs Asp Lys Val Leu Val Lys Val Trp Leu Ala Ala Leu Arg Asp Thr 

35 40 45 

Glu Arg Glu He Thr Asp Leu His Arg lie Cys Glu Asp His Phe Leu 

50 55 60 

Ser His His He Thr Ala Asp Gly He Ser Pro Asp Ala He Pro lie 
65 70 75 80 

Met Pro Pro Leu Asp Gly Pro Val Gly Asn Trp 
85 ~ 90 



<210> 75 

<211> 84 

<212> PRT 

<213> Danio rerio 

Met°Pro 5 Ile Ser Cys Ser Ala Val Asp Cys Ser Asn Arg Phe Val Lys 

1 5 10 15 

Glv Ser Glu He Arg Phe Tyr Arg Phe Pro He Ser Lys Pro Gin Leu 

20 ~ 25 30 

Ala Glu Gin Trp Val Arg Ser Leu Gly Arg Lys Asn Phe Val Pro Thr 

35 40 45 

Gin Asn Ser Cys Leu Cys Ser Glu His Phe Gin Pro Asp Cys Phe Arg 

50 55 • 60 

Asp Tyr Asn Gly Lys Leu Phe Leu Arg Glu Asp Ala Val Pro Thr He 
65 70 75 80 

Phe Ser Asn Ser 
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<210> 76 
<211> 95 
<212> PRT 

<213> Oryzias latipes 



<400> 76 



Met 


Pro 


Asn 


Phe 


Cys 


Ala 


Ala 


Pro 


1 








5 








Ser 


Asp 


Leu 


Ala 


Phe 


Phe 


Arg 


Phe 








20 








He 


Trp Val 


Glu 


Asn 


Cys 


Arg 


Arg 






35 










40 


Asp 


Gin 


Leu 


Asn 


Lys 


His 


Tyr 


Arg 




50 










55 




Ala 


Met 


Val 


Cys 


Lys 


Thr 


Ser 


Pro 


65 










70 






Ala 


He 


Pro 


Thr 


He 


Phe Asp 


Leu 



85 



Asn Cys Thr Arg Lys Ser Thr Gin 

10 15 
Pro Arg Asp Pro Glu Arg Cys Arg 
25 30 
Ala Asp Leu Glu Ala Lys Thr Ala 
45 

Leu Cys Ala Lys His Phe Asp Pro 
60 

Tyr Arg Thr Val Leu Lys Asp Thr 

75 80 
Thr Ser His Leu Lys Asn Pro 
90 95 



<210> 77 
<211> 90 
<212> PRT 

<213> Oryzias latipes 



<400> 77 



Met 


Pro Thr Gly 


Cys 


Ala 


His 


Ala 


1 




5 








Phe 


Arg Gly Val 


Thr 


Phe 


His 


Lys 




20 










Ser 


Arg Trp Thr 


Lys 


Phe 


Met 


Lys 




35 








40 


Tyr 


Tyr Asp Arg 


Val 


Cys 


Ser 


Val 




50 






55 




Arg 


Thr Gly Gin 


Thr 


Val 


Arg 


Leu 


65 






70 






Pro 


His Leu Pro 


Trp Arg 


Phe 


Pro 



85 



Asn Cys Arg Asn Val Val Gly Lys 

10 15 
Phe Pro Arg Asp Pro Glu Lys Leu 
25 30 
Arg His Glu Ser Trp Val Pro Lys 
45 

His Phe Ser Ser Glu His Phe Asp 
60 

Arg Asp Asn Ala Glu Pro Ser Leu 
75 80 

Lys Ser 
90 



<210> 78 
<211> 94 
<212> PRT 

<213> Oryzias latipes 
<400> 78 

Met Gin Asn Arg Cys Ala Val Leu Thr Cys Pro Ser Gly Lys Thr Asp 

15 10 15 

Phe Gin Pro Met Phe Arg Phe Pro His Asp Gin Glu Arg Ser Arg Arg 

20 25 30 

Trp Val Glu Lys Cys Gin Gly Glu Asn Leu He Gly Lys Ser Pro Glu 

35 40 45 

Gin Leu Tyr Arg Tyr Tyr Arg He Cys Lys Arg His Phe Glu Thr Ser 

50 55 60 

Ala Phe Asp Cys Asp Ala Asp. Gly Ala Val Leu Lys Lys Asp Ala Val 
65 70 75 * 80 

Pro Thr He Phe Asp Ala Ser Val Pro Pro Gin Ser Ser Gin 
85 90 
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<210> 79 
<211> 92 
<212> PRT 

<213> Drosophila melanogaster 
<400> 79 

Met Pro Ala His Cys Ala Val lie Asn Cys Ser His Lys Tyr Val His 

1 5 10 ' 15 

Ala Gly Ser He Ser Phe His Arg Phe Pro Phe Lys Arg Lys Asp Leu 

20 25 ^ 30 

Leu Gin Lys Trp Lys Glu Phe Thr Gin Arg Ser Ala Gin Trp Met Pro 

35 4 0 45 

Ser Lys Trp Ser Ala Leu Cys Ser Arg His Phe Gly Asp Glu Asp Phe 

50 55 60 

Asn Cys Ser Asn Asn Arg Lys Thr Leu Lys Lys Asn Ala Val Pro Ser 
65 70 75 80 

He Arg Val Ser Glu Asp Asp Ser Met Ser Gly His 
85 90 



<210> 80 
<211> 90 
<212> PRT 

<213> Drosophila melanogaster 
<400> 80 

Met Pro Thr He Arg Arg Cys Cys 

1 5 
Arg Gin His Pro Ser Met Gin Phe 

20 

Pro Phe His Lys Leu Trp Lys Glu 

35 ^ 40 

lie Val Pro Phe Lys Lys Pro Val 

50 " 55 

Ser Val Leu Gly Gly Arg Arg Leu 
65 70 
Arg Leu Glu Val Pro Ser Asn Leu 
85 



He He Gly Cys Leu Ser Asn Ser 

10 15 
Phe Ala Phe Pro Arg Pro Glu Asn 
25 30 
Ala Cys His Ala Ser Leu Arg Arg 
45 

Val Cys Ala Leu His Phe Asp Pro 
60 

Gin Ser Asn Ala Leu Pro Thr Leu 
75 80 

Glu Ala 
90 



<210> 81 
<211> 104 
<212> PRT 

<213> Drosophila melanogaster 



<400> 81 



Met 
1 


Arg 


Cys 


Ala 


Val 
5 


Pro 


Asn 


Cys 


Lys 


Arg 


Asn 


Ala 


Ala 


Gin 


Gin 


Gin 








20 










Lys 


Cys 


Pro 


Asp 


Thr 


Phe 


Lys 


Ala 






35 










40 


Glu 


Glu 


Ser 


Leu 


Lys 


Leu 


Lys 


Asn 




50 










55 




Lys 


Asp 


Glu 


Asp 


He 


Glu 


Gly 


Ser 


65 










70 






Lys 


Lys 


Arg 


Thr 


Leu 


Arg 


Pro 


Gly 










85 








Gin 


Glu 


Ser 


Gly 


Ser 


Asp 


Arg 


Ala 



100 



Arg Asn Phe Ser Asp Cys Arg Ser 

10 15 
Arg Leu Gly Phe Phe Arg Phe Pro 
25 30 
Trp Leu Ala Phe Cys Gly Tyr Thr 
4 5 

Pro Cys He Cys He Glu His Phe 
60 

Leu Lys Phe Glu Met Gly Leu Ala 

75 SO 
Ala Val Pro Cys Val Asn Lys Ser 
90 95 
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<210> 82 
<211> 96 
<212> PRT 

<213> Drosophila melanogaster 
<400> 82 

Met Gly Gly Thr Lys Cys Cys Phe Arg Asp Cys Pro Val Gly Ser Ser 

n 5 10 15 

Arg Asn Pro Asn Met His Phe Phe Lys Phe Pro Val Lys Asp Pro Lys 

20 25 30 

Arg Leu Lys Asp Trp Val Arg Asn Cys Ser Asn Pro Asp Val Ser Asn 

35 40 45 

Ala Pro Pro Ser Lys Leu Ala Ala Lys Thr Val Cys Ala Arg His Phe 

50 55 60 

Arg Ala Glu Cys Phe Met Asn Tyr Lys Met Asp Arg Leu He Pro Met 

7( - ) 75 80 

Gin Thr Pro Thr Leu Phe Arg He Asn Arg Asp Leu Ala Leu Asp Tyr 
85 90 95 



<210> 83 
<211> 96 
<212> PRT 

<213> Drosophila melanogaster 
<400> 83 

Met Ala Thr Arg Ser Cys Ala Tyr Lys Asp Cys Glu Tyr Tyr Tyr Val 

5 10 15 

Gly His Glu Asn Ala Leu Thr Lys Gly Arg Thr Leu Phe Ala Phe Pro 

20 25 30 

Lys Gin Pro Gin Arg Ala Arg He Trp His Glu Asn Gly Gin Val His 

35 40 45 

Pro Lys He Pro His Ser Gin Leu Phe Met Cys Ser Leu His Phe Asp 

50 55 60 

Arg Lys Phe He Ser Ser Ser Lys Asn Arg Thr Leu Leu Val Gly Glu 

70 75 J 8Q 

Ala Val Pro Phe Pro Tyr Glu Glu Ser Ser Ser Lys Pro Glu Glu Glu 
85 90 95 

<210> 84 
<211> 87 
<212> PRT 

<213> Drosophila melanogaster 
<400> 84 

Met Lys Tyr Cys Lys Phe Cys Cys Lys Ala Val Thr Gly Val Lys Leu 

5 !0 15 

He Hrs Val Pro Lys Cys Ala He Lys Arg Lys Leu Trp Glu Gin Ser 

20 25 30 

Leu Gly Cys Ser Leu Gly Glu Asn Ser Gin lie Cys Asp Thr His Phe 

Asn Asp Ser Gin Trp Lys Ala Ala Pro Ala Lys Gly Gin Thr Phe Lys 

ou 55 60 

Arg Arg Arg Leu Asn Ala Asp Ala Vai Pro Ser Lys Val He Glu Pro 

70 75 on 

Glu Pro Glu Lys He Lys Glu 
85 
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<210> 85 
<211> 92 
<212> PRT 

<213> Anopheles gambiae 
<400> 85 



Met 


Pro 


Ala 


Ser 


Cys 


Val 


He 


Pro 


Asp 


Cys 


Asp 


Leu 


Lys 


Tyr 


Thr 


His 


1 








5 










10 










15 




Gly 


Asp 


Asp 


Val 


Ser 


Phe 


His 


Lys 


Phe 


Pro 


Leu 


Lys 


Ser 


Pro 


Glu 


Leu 






20 










25 










30 






Leu 


Lys 


Gin 


Trp 


He 


Gin 


Phe 


Thr 


Gly Arg Asp 


Glu 


Gly 


Trp 


His 


Pro 






35 










40 










45 








Thr 


Lys 


Trp 


Ser 


Ala 


Leu Cys 


Ser 


Arg 


His 


Phe 


Val 


Ala 


Ser 


Asp 


Phe 




50 










55 










60 










Lys 


Gly 


Cys 


Ala 


Ala 


Arg 


Lys 


He 


Leu 


Leu 


Pro 


Thr 


Ala 


Val 


Pro 


Ser 


65 










70 










75 










80 


Val 


Arg Asn 


Ala 


Val 


Ala 


Ala 


Lys 


Ala 


Gin 


Pro 


Asn 











85 90 



<210> 86 
<211> 108 
<212> PRT 

<213> Anopheles gambiae 
<400> 86 



Met 


Ser 


Ala 


Val 


Arg 


Ser Cys Ala 


Leu 


Cys Gin 


Asn 


Arg 


Ser 


Asn 


He 


1 








5 






10 








15 




Thr 


Asp 


Gin 


Gin 
20 


Thr 


Asp Asp Ala 


Leu 
25 


Glu Arg 


He 


Thr 


Tyr 
30 


His 


Lys 


Phe 


Pro 


Thr 

35 


Asn 


Pro 


Val Arg Arg 
40 


Asp 


Arg Trp 


He 


Glu 
45 


Phe 


Cys 


Asp 


Leu 


Pro 
50 


Lys 


Glu 


Ser 


Phe Pro Lys 
55 


Ser 


Ala Tyr 


Lys 
60 


Phe 


Leu 


Cys 


Ser 


Ser 


His 


Phe 


Thr 


Pro 


Glu Cys Phe 


Glu 


Arg Asp 


Leu 


Arg 


Gly 


Glu 


Leu 


65 










70 




75 










80 


Leu 


Tyr 


Gly 


Thr 


Lys 
85 


Arg Met Thr 


Leu 


Gin Lys 
90 


Asp 


Ala 


Met 


Pro 
95 


Thr 


He 


Arg 


Ser 


Val 
100 


Ser 


Gin Gin Leu 


Lys 
105 


Arg Thr 


Thr 











<210> 87 
<211> 100 
<212> PRT 

<213> Anopheles gambiae 



<400> 87 



Met 


Trp Asp Cys 


Ala 


Val 


He Gly Cys 


Pro 


Asn 


Ser Arg 


Phe 


Asn 


Ala 


1 




5 








10 








15 




Gin 


Lys Thr Arg 


Pro 


Arg 


lie Ser 


Phe 


His 


Val 


Phe Pro 


His 


Pro 


Val 




20 








25 








30 






Arg 


Glu Ser Asn 


Arg 


Phe Arg Arg 


Trp 


Leu 


Ala 


Leu lie 


Asn 


Asn 


Pro 




35 






40 








45 








Arg 


Leu Phe Arg 


Leu 


Asp 


Pro Leu 


Asn 


Val 


Phe 


Lys Ser 


Val 


Arg 


Val 




50 






55 








60 








Cys 


Arg Arg His 


Phe 


Gly 


Pro Asp 


Cys 


Phe 


Asn 


Gly Val 


Cys 


Arg Asn 


65 






70 








75 








80 


Leu 


Leu Pro Thr 


Ala 


He 


Pro Thr 


Leu 


Asn 


Leu 


Pro Glu 


Val 


Arg 


Pro 






85 








90 








95 
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Val Ala Leu Val 
100 

<210> 88 
<211> 95 
<212> PRT 

<213> Anopheles gambiae 



<400> 88 

Met Gly He Arg Lys Cys He Val Pro Glu Cys Pro Ser Ser Ser Ala 

Arg Pro Glu Asp Arg Gly Val Thr Tyr uL Lys He Pro Tyr ^eu Asp 

Glu Met Lys Arg Leu Trp He Val £ a Cys His Leu Pro £p Asp Tyr 

Phe Ala Thr Lys Ala Ser Asn Val Cys Ser Arg His Phe Arg Arg Ala 

^ 60 
Asp Phe Gin Glu Phe Lys Gly Lys Lys Tyr Val Leu Lys Leu Gly Val 

Val Pro Thr Val Phe Pro Trp Thr Val Thr Lys Pro Pro Gly Glu 



75 80 

s no i rp rnr val Thr 
85 90 - 95 

<210> 89 
<211> 107 
<212> PRT 

<213> Anopheles gambiae 
<400> 89 

Met Gly Lys He Ser Gly Ser His Cys Leu Val Leu Gly Cys Arg Asn 



Arg Gin Leu Leu Asn Gin Ala Asn He Arg Ser Tyr Phe Arg Phe Pro 

25 30 
Arg Asp Ala Asp Leu Cys Lys Lys Trp Val Asp Phe Cys Asn Arg Pro 

4 0 AC 

Glu Leu Tyr Lys Lys Tyr Asp Glu Asn Gly Pro Glu Tyr Leu Tyr Lys 
Ser Ser Arg He Cys Ser Asp His Phe Gin Pro Sa Asp Phe Asn Asn 
Pro Asn Leu Phe Ser Gin Gly Leu Lys Lys Gly Ser Val Pro Ser Val 

85 90 qc 

Asn Pro Ala Asn Leu Glu Ala Ala Lys Pro His 



<210> 90 
<211> 104 
<212> PRT 

<213> Anopheles gambiae 
<400> 90 

Met Thr Asn Cys Ser Cys Ala Val Ala Asp Cys Asn Asn Asn Arg Arg 



10 



Asn Val Arg Lys Arg Met Leu Asp lie Gly Phe His Thr Phe Pro Ser 



25 

Asp Pro vjl Gin Arg Gin Arg Trp Val Lys Phe Cys Gin Arg Glu Pro 
Ser Trp Gin Pro Lys Ser Cys Asp Ser Met Cys Ser Val His Phe Lys 



Asp Thr Asp Tyr Gin Met Ser His Ser Pro Leu lie Arg Leu Ala Thr 
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£■ c 70 75 80 

Asn Leu Arg Arg Leu Lys Pro Asp Val lie Pro Thr He Arg Lys Gly 
85 90 
Arg Ala He Pro Val Ala Ala Arg 
100 



<210> 91 
<211> 95 
<212> PRT 

<213> Anopheles gambiae 

Me^GlyW Cys Arg Cys Thr Phe Arg Asp Cys Glu Asn Gly Thr Ala 

Ser Arg Lys Glu Leu His Tyr Phe Arg Tyr Pro Val Arg Asp Gin Glu 

20 25 JU 

Arg Leu He Glu Trp Ala Lys Asn Ala Asp Arg Leu Glu Phe Val Asp 

35 40 45 

Leu Pro Val Asp Lys Val Ser Asn Lys Val Val Cys Gin Glu Mrs Phe 



50 55 60 

u 

!L He Pro Arg Leu Met Val Met Pro Asp Glu Thr He Val Asn 



Glu Arg Lys Met Phe Met Asn Asp Leu Arg Asp Arg Leu Thr Lys Met 



70 75 
Asp Gli 

85 90 95 



<210> 92 
<211> 97 
<212> PRT 

<213> Anopheles gambiae 

Met°Lys 2 Cys Phe Val Ser Gly Cys Asp Thr Asp Asp Asn Val Val Ser 

Tyr Thr Ser Val Phe Tyr Val Asn Cys Pro Thr Asp Pro Thr He Gin 

?0 25 JU 

Gin Gin Trp Phe Thr Leu Leu Glu Val Thr Asp Pro Asp Ala Met Arg 

Ala Leu Val Asp Gly Arg Ser Lys Val Cys Ser Cys His Phe Thr Glu 

50 55 
Asp Cys Phe Gly His His Pro Val Tyr Gly Tyr Arg Tyr Leu Leu Ala 

cz. ^ 70 

Tnr Ala Leu Pro Thr Val Phe Pro Pro Arg Lys Glu He Glu Gin Pro 



Lys 



<210> 93 

<211> 92 

<212> PRT 

<213> Bombyx mori 

Set°Pro 3 Arg Cys Ser Val lie Val Cys Lys Asn Asn Ser Cys lie Val 

1 *5 10 

Asn Tyr Lys Lys Asp Ser He Ser Phe His Thr Tyr Pro Lys Asp Pro 

Lys He Lys Glu Met Trp He Asn Ala Thr Gly Arg Gly Pro Ser Trp 
35 40 
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Phe Pro Thr Lys Asn His Thr lie Cys Ser Ser His Phe Glu Pro Lys 
50 55 60 

Cys Phe Gin Pro Leu Lys Lys Val Arg Arg Leu Phe Glu Trp Ser Val 
65 70 75 80 

Pro Thr Leu Lys Leu Arg Met Val Leu Met Asn Tyr 
85 90 



<210> 94 

<211> 96 

<212> PRT 

<213> Bombyx mori 



<400> 94 



Met 


Pro 


Asp 


Thr 


His 


Arg 


Thr 


Cys 


Glu 


Val 


Cys 


Gly 


lie 


Lys 


Glu 


Ar^ 


1 








5 










10 








15 




His 


Leu 


Thr 


Glu 
20 


Lys 


Arg 


Phe 


Phe 


Ala 
25 


Arg 


Phe 


Pro 


Leu 


Asp 
30 


Val 


Asn 


Arg 


Cys 


Lys 


Gin 


Trp Val 


Lys 


Met 


Val 


Gly 


Lys 


Glu Asp 


Leu 


Ala 


Tyr 






35 










40 










45 






Leu 


Gin 


Val 


His 


Met 


Leu 


His 


Asp 


Leu 


Lys 


His 


Val 


Cys 


Glu 


Ala 


His 




50 










55 










60 








Phe 


Ser Arg 


Arg Asp 


Phe 


Thr 


Lys 


Ser 


Lys 


Lys 


Arg 


Leu 


Lys 


Lys 


Arg 


65 










70 










75 






80 


Ala 


Val 


Pro 


Lys 


Leu 


Asn 


Leu 


Thr 


Leu 


Pro 


Pro 


Leu 


Arg Asp 


Glu 


He 



85 90 95 



<210> 95 
<211> 89 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 95 



Met 


Pro 


Thr 


Thr 


Cys 


Gly 


Phe 


Pro 


Asn 


Cys 


Lys 


Phe 


Arg 


Ser Arg 


Tyr 


1 








5 










10 










15 


Arg 


Gly 


Leu 


Glu 


Asp 


Asn 


Arg 


His 


Phe 


Tyr 


Arg 


He 


Pro 


Lys 


Arg 


Pro 








20 










25 










30 




Leu 


He 


Leu Arg 


Gin 


Arg 


Trp 


Leu 


Thr 


Ala 


He 


Gly Arg 


Thr 


Glu 


Glu 






35 










40 










45 








Thr 


Val 


Val 


Ser 


Gin 


Leu Arg 


He 


Cys 


Ser 


Ala 


His 


Phe 


Glu 


Gly Gly 




50 










55 










60 










Glu 


Lys 


Lys 


Glu 


Gly 


Asp 


He 


Pro 


Val 


Pro Asp 


Pro 


Thr 


Val 


Asp 


Lys 


65 










70 










75 








80 


Gin 


He 


Lys 


He 


Glu 


Leu 


Pro 


Pro 


Lys 
























85 























<210> 96 
<211> 100 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 96 

Met Tyr Gly Val Gin Ser Glu Cys Val Leu Cys Ala His Ala Asn Asp 

1 .5 10 15 

Cys Thr Ala Met He Pro Phe Pro Gly Pro Asp Asp Glu Lys Leu Arq 

20 25 * 30 

Thr Lys Trp He Asn Ser Met Cys Arg Glu Pro Trp He Tyr Arg Tyr 

35 40 45 

Leu Ser Thr Arg Leu Glu Lys Pro Gly Arg His Tyr Leu Cys Ala Ser 
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50 55 60 

His Phe Asn Arg Asn Ser Leu Arg Tyr His Ala Gly Leu Gly Leu Trp 
65 70 75 80 

Arq Arq Ala Ala Ala Cys Pro Val Leu Ala Cys Thr Thr Asp Glu Glu 
85 " 90 95 

Arg Gin Glu Val 
100 



<210> 97 
<211> 86 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 97 

Met Glu His Pro Leu Gin Cys Cys Tyr Cys Leu Glu Val Tyr Glu Lys 

15 10 15 

Arq Tyr Met Thr Gin Val Pro Lys Thr Glu Gin Arg He Ala Arg Trp 

20 25 30 

Val Ala He Leu Gly Glu Gin Phe Arg He Arg Leu Arg Met Lys Pro 

35 40 45 

Ala Asn Tyr Met Cys Arg Lys His Phe Pro Gin Ala Asp Phe Ser Ser 

50 55 60 

Arq Glv Arq Leu Leu Lys Thr Ala Val Pro Asn Val Val Ser Gin Glu 
65 70 75 80 

Lys Val Leu Ala Phe Lys 
85 



<210> 98 
<211> 97 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 98 

Asn Leu Thr His Lys Pro Cys Thr Val Cys Asn Arg Val Met Lys Ser 

15 10 15 

Gly Glu Met His Leu Asn Phe Pro Ala Asp Leu Asp Arg Arg Arg He 

20 25 30 

Trp Ala Asn Leu Leu Gly Phe Lys Tyr Lys Asp He Leu Arg Ser Lys 

35 40 45 

Met Gly Pro Val Ser Phe Ser He Ala Ala Gly Pro He Cys Thr Glu 

50 55 60 

His Phe Ala Glu Glu Cys Phe Arg Asn His Asn Phe Asn Lys Ser Ala 
65 70 75 80 

He Glu Ala Phe Gly Val Pro Val Ala He Ser Pro Asp Val Lys Thr 
85 90 95 

Thr 



<210> 99 

<211> 210 

<212> PRT 

<213> Mus musculus 

<400> 99 

Met Val Gin Ser Cys Ser Ala Tyr 

1 5 
Asp Lys Pro Val Ser Phe His Lys 
20 



Gly Cys Lys Asn Arg Tyr .Asp Lys 

10 - 15 
Phe Pro Leu Thr Arg Pro Ser Leu 
25 30 
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Cys Lys Gin Trp Glu Ala Ala Val Lys Arg Lys Asn Phe Lys Pro Thr 







35 










40 










45 






Lys 


Tyr 


Ser 


Ser 


He 


Cys 


Ser 


Glu 


His 


Phe 


Thr 


Pro 


Asp 


Cys Phe 


Lys 




50 










55 










60 




Arg 


Glu 


Cys 


Asn 


Asn 


Lys 


Leu 


Leu 


Lys 


Glu 


Asn 


Ala 


Val 


Pro Thr 


He 


65 










70 








75 








80 


Phe 


Leu 


Tyr 


He 


Glu 


Pro 


His 


Glu 


Lys 


Lys 


Glu 


Asp 


Leu 


Glu Ser 


Gin 










85 










90 






95 




Glu 


Gin 


Leu 


Pro 


Ser 


Pro 


Ser 


Pro 


Pro 


Ala 


Ser 


Gin 


Val 


Asp Ala Ala 








100 










105 










110 




lie 


Gly 


Leu 


Leu 


Met 


Pro 


Pro 


Leu 


Gin 


Thr 


Pro 


Asp 


Asn 


Leu Ser 


Val 






115 










120 








125 






Phe 


Cys 


Asp 


His 


Asn 


Tyr 


Thr 


Val 


Glu 


Asp 


Thr 


Met 


His 


Gin Arg 


Lys 




130 










135 










140 




Arg 


He 


Leu 


Gin 


Leu 


Glu 


Gin 


Gin 


Val 


Glu 


Lys 


Leu Arg 


Lys Lys 


Leu 


145 










150 










155 






160 


Lys 


Thr 


Ala 


Gin 


Gin 


Arg 


Cys 


Arg 


Arg 


Gin 


Glu 


Arg 


Gin 


Leu Glu 


Lys 










165 










170 






175 


Leu 


Lys 


Glu 


Val 


Val 


His 


Phe 


Gin 


Arg 


Glu 


Lys 


Asp Asp Ala Ser 


Glu 








180 










185 










190 




Arg 


Gly 


Tyr 
195 


Val 


He 


Leu 


Pro 


Asn 
200 


Asp 


Tyr 


Phe 


Glu 


He 
205 


Val Glu 


Val 



Pro Ala 
210 



<210> 100 
<211> 217 
<212> PRT 
<213> Mus mus cuius 

<400> 100 



Met 


Pro 


Thr Asn 


Cys 


Ala 


Ala 


Ala 


Gly Cys 


Ala 


Ala 


Thr 


Tyr Asn 


Lys 


1 






5 








10 








15 


His 


He 


Asn He 


Ser 


Phe 


His 


Arg 


Phe Pro 


Leu 


Asp 


Pro 


Lys Arg Arg 






20 










25 








30 




Lys 


Glu 


Trp Val 


Arg 


Leu 


Val 


Arg 


Arg Lys 


Asn 


Phe 


Val 


Pro Gly 


Lys 






35 








40 








45 


His 


Thr 


Phe Leu 


Cys 


Ser 


Lys 


His 


Phe Glu 


Ala 


Ser 


Cys 


Phe Asp 


Leu 




50 








55 








60 




Thr 


Gly Gin Thr Arg Arg 


Leu 


Lys 


Met Asp Ala Val 


Pro 


Thr He 


Phe 


65 








70 








75 








80 


Asp 


Phe 


Cys Thr 


His 


He 


Lys 


Ser 


Leu Lys 


Leu 


Lys 


Ser Arg Asn 


Leu 








85 








90 








95 




Leu 


Lys 


Thr Asn 


Asn 


Ser 


Phe 


Pro 


Pro Thr Gly 


Pro 


Cys 


Asn Leu 


Lys 






100 










105 






110 


Leu 


Asn 


Gly Ser 


Gin 


Gin 


Val 


Leu 


Leu Glu 


His 


Ser 


Tyr Ala Phe 


Arg 






115 








120 








125 




Asn 


Pro 


Met Glu 


Ala 


Lys 


Lys 


Arg 


He He 


Lys 


Leu 


Glu 


Lys Glu 


He 




130 








135 








140 






Ala 


Ser 


Leu Arg 


Lys 


Lys 


Met 


Lys 


Thr Cys 


Leu 


Gin 


Arg 


Glu Arg Arg 


145 








150 








155 






160 


Ala 


Thr 


Arg Arg 


Trp 


He 


Lys 


Ala 


Thr Cys 


Phe 


Val 


Lys 


Ser Leu 


Glu 


Ala 






165 








170 






175 




Ser 


Asn Met 
180 


Leu 


Pro 


Lys 


Gly 


He Ser 
185 


Glu 


Gin 


He 


Leu Pro 
190 


Thr 


Ala 


Leu 


Ser Asn 


Leu 


Pro 


Leu 


Glu 


Asp Leu 


Lys 


Ser 


Leu 


Glu Gin 


Asp 






195 








200 








205 




Gin 


Gin 
210 


Asp Lys 


Thr 


Val 


Pro 
215 


He 


Leu 
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<210> 101 
<211> 218 
<212> PRT 

<213> Mus musculus 

<400> 101 „ 

Met Pro Lys Ser Cys Ala Ala Arg Gin Cys Cys Asn Arg Tyr Ser Ser 

1 5 10 15 

Arq Arg Lys Gin Leu Thr Phe His Arg Phe Pro Phe Ser Arg Pro Glu 

20 25 30 

Leu Leu Arg Glu Trp Val Leu Asn He Gly Arg Ala Asp Phe Lys Pro 

35 40 45 

Lys Gin His Thr Val He Cys Ser Glu His Phe Arg Pro Glu Cys Phe 

50 55 60 

Ser Ala Phe Gly Asn Arg Lys Asn Leu Lys His Asn Ala Val Pro Thr 
65 ' 70 75 80 

Val Phe Ala Phe Gin Asn Pro Thr Glu Val Cys Pro Glu Val Gly Ala 

85 90 95 

Glv Glv Asp Ser Ser Gly Arg Asn Met Asp Thr Thr Leu Glu Glu Leu 

Y 100 ' 105 HO 

Gin Pro Pro Thr Pro Glu Gly Pro Val Gin Gin Val Leu Pro Asp Arg 

115 120 125 

Glu Ala Met Glu Ala Thr Glu Ala Ala Gly Leu Pro Ala Ser Pro Leu 

130 135 140 

Gly Leu Lys Arg Pro Leu Pro Gly Gin Pro Ser Asp His Ser Tyr Ala 
145 " 150 155 160 

Leu Ser Asp Leu Asp Thr Leu Lys Lys Lys Leu Phe Leu Thr Leu Lys 

165 170 175 

Glu Asn Lys Arg Leu Arg Lys Arg Leu Lys Ala Gin Arg Leu Leu Leu 

180 ~ 185- 190 

Ara Arq Thr Cys Gly Arg Leu Arg Ala Tyr Arg Glu Gly Gin Pro Gly 

195 200 205 

Pro Arg Ala Arg Arg Pro Ala Gin Gly Ser 
210 215 



<210> 102 
<211> 205 
<212> PRT 

<213> Mus musculus 

<400> 102 „. T 

Met Val He Cys Cys Ala Ala Val Asn Cys Ser Asn Arg Gin Gly Lys 

1 5 10 15 

Gly Glu Lys Arg Ala Val Ser Phe His Arg Phe Pro Leu Lys Asp Ser 

20 25 30 

Lys Arg Leu He Gin Trp Leu Lys Ala Val Gin Arg Asp Asn Trp Thr 

35 40 45 

Pro Thr Lys Tyr Ser Phe Leu Cys Ser Glu His Phe Thr Lys Asp Ser 

50 55 60 

Phe Ser Lys Arg Leu Glu Asp Gin His Arg Leu Leu Lys Pro Thr Ala 
65 70 75 80 

Val Pro Ser He Phe His Leu Ser Glu Lys Lys Arg Gly Ala Gly Gly 

85 • 90 95 

His Gly His Ala Arg Arg Lys Thr Thr Ala Ala Met Arg Gly His Thr 

100 ~ " 105 HO 

Ser Ala Glu Thr Gly Lys Gly Thr He Gly Ser Ser Leu Ser Ser Ser 

115 ~ 120 125 

Asp Asn Leu Met Ala Lys Pro Glu Ser Arg Lys Leu Lys Arg Ala Ser 
130 135 140 
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" Leu Gin Asp Asp AI3 Ala Pro Lys 

145 150 
Glu Gin Gly Gin Ser Leu Glu Lys 
165 

Pro Leu Ala Arg Gly Gin Glu Glu 
180 

Asp His Gin Lys Ala Ser Ser Ser 
195 200 



Val Thr Pro Gly Ala Val Ser Gin 
155 160 
Thr Pro Gly Asp Asp Pro Ala Ala 

170 175 
Ala Gin Ala Ser Ala Thr Glu Ala 
185 190 
Thr Asp Ala Glu Gly 
205 



<210> 103 
<211> 186 
<212> PRT 

<213> Mus musculus 



<400> 103 






He 


Leu 


Gin 


Ala 


Phe 


1 








5 


Arg 


His 


Phe 


Lys 
20 


Lys 


Leu 


Lys 


Ala 
35 


Gly 


Ala 


Gin 


Glu 
50 


Lys 


Arg 


Glu 


Thr 


Leu 


Pro 


He 


Thr 


65 










He 


Glu 


Glu 


Phe 


Glu 
85 


Met Asp 


Ser 


Pro 


Lys 








100 




Glu 


Leu 


Glu 
115 


Asn 


Thr 


Lys 


His 
130 


Phe 


Gin 


Lys 


Glu 


Ser 


Leu 


He 


Ser 


145 










Trp 


Glu 


Cys 


Tyr 


His 
165 


He 


Ser 


Tyr 


Met 
180 


Leu 



Gly Ser Leu Lys Lys Gly 
10 

Thr Asp Phe Asp Arg Ser 
25 

He Pro Ser He Phe Glu 
40 

Lys Leu His Cys Arg Lys 
55 

His His Gly Arg Gin Leu 
70 75 
Pro Gin Phe He Phe Glu 
90 

Lys Leu Lys His Lys Leu 
105 

Lys Glu Ser Leu Arg Asn 
120 

Ser Leu Arg Lys Thr He 
135 

Gin Glu Thr Ala Asn Ser 
150 155 
Glu Ser Thr Ala Gly Gly 
170 

His Leu Gin Leu Thr 
185 



Asp Val Leu Cys Ser 
15 

Thr Leu Asn Thr Lys 
30 

Cys Pro Tyr His Leu 
45 

Asn Phe Leu Leu Lys 
60 

Val Gly Ala Ser Cys 
80 

His Ser Tyr Ser Val 
95 

Asp Arg Val He He 
110 

Val Leu Ala Arg Glu 
125 

Met Glu Leu Lys Asp 
140 

Leu Gly Ala Phe Cys 
160 

Cys Ser Cys Glu Val 
175 



<210> 104 
<211> 194 
<212> PRT 

<213> Mus musculus 



<400> 104 



Met 
1 


Pro 


Arg 


His 


Cys Ser Ala Ala 
5 


Glu 


Thr 


Arg 


Asn 


Arg Gly He Ser 








20 




Asn 


Pro 


Arg 


Arg 


Gly Leu Trp Leu 






35 




40 


Ser 


Gly 


Gin 


Gly 


Leu Trp Asp Pro 




50 






55 


Ser 


Lys 


His 


Phe 


Glu Glu Asn Cys 


65 








70 


Tyr 


His 


Arg 


Leu 


Lys Glu Gly Ala 










85 


Ser 


Lys 


Leu 


Arg 


Arg Thr Ala Lys 



Gly Cys Cys Thr Arg Asp Thr Arg 

10 15 
Phe His Arg Leu Pro Lys Lys Asp 
25 30 
Ala Asn Cys Gin Arg Leu Asp Pro 
45 

Thr Ser Glu Tyr He Tyr Phe Cys 
60 

Phe Glu Leu Val Gly He Ser Gly 

75 80 
Val Pro Thr He Phe Glu Ser Phe 

90 95 
Thr Lys Gly His Gly Tyr Pro Pro 
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100 . 105 HO 

Gly Leu Pro Asp Val Ser Arg Leu Arg Arg Cys Arg Lys Arg Cys Ser 
115 120 125 

Glu Arg Gin Gly Pro Thr Thr Pro Phe Ser Pro Pro Pro Arg Ala Asp 

130 135 I 40 

lie lie Cys Phe Pro Val Giu Glu Ala Ser Ala Pro Ala Thr Leu Pro 
145 150 155 l fo U 

Ala Ser Pro Ala Val Arg Leu Asp Pro Gly Leu Asn Ser Pro Phe Ser 

165 110 L 

Asp Leu Leu Gly Pro Leu Gly Ala Gin Ala Asp Glu Ala Gly Cys Ser 
180 185 I 90 

Thr Gin 



<210> 105 
<211> 305 
<212> PRT 
<213> Mus musculus 

5et°Pro°Gly Phe Thr Cys Cys Val Pro Gly Cys Tyr Asn Asn Ser His 

1 5 10 15 

Arg Asp Lys Ala Leu His Phe Tyr Thr Phe Pro Lys Asp Ala Glu Leu 

20 25 30 

Arq Arg Leu Trp Leu Lys Asn Val Ser Arg Ala Gly Val Ser Gly Cys 

35 40 45 

Phe Ser Thr Phe Gin Pro Thr Thr Gly His Arg Leu Cys Ser Val His 

50 55 60 

Phe Gin Gly Gly Arg Lys Thr Tyr Thr Val Arg Val Pro Thr He Phe 
65 7 0 7 5 ou 

Pro Leu Arg Gly Val Asn Glu Arg Lys Val Ala Arg Arg Pro Ala Gly 

85 9° 95 

Ala Ala Ala Ala Arg Arg Arg Gin Gin Gin Gin Gin Gin Gin Gin Gin 

100 105 HO 

Gin Gin Gin Gin Gin Gin Leu Gin Gin Gin Gin Pro Ser Pro Ser Ser 

115 120 125 

Ser Thr Ala Gin Thr Thr Gin Leu Gin Pro Asn Leu Val Ser Ala Ser 

130 135 140 

Ala Ala Val Leu Leu Thr Leu Gin Ala Ala Val Asp Ser Asn Gin Ala 
145 150 155 160 

Pro Gly Ser Val Val Pro Val Ser Thr Thr Pro Ser Gly Asp Asp Val 

165 110 15 

Lys Pro He Asp Leu Thr Val Gin Val Glu Phe Ala Ala Ala Glu Gly 

180 185 190 

Ala Ala Ala Ala Ala Ala Ala Ser Glu Leu Glu Ala Ala Thr Ala Gly 

195 200 205 

Leu Glu Ala Ala Glu Cys Thr Leu Gly Pro Gin Leu Val Val Val Gly 

210 215 220 

Glu Glu Gly Phe Pro Asp Thr Gly Ser Asp His Ser Tyr Ser Leu Ser 
925 230 235 240 

Ser Gly Thr Thr Glu Glu Glu Leu Leu Arg Lys Leu Asn Glu Gin Arg 

245 250 
Asp He Leu Ala Leu Met Glu Val Lys Met Lys Glu Met Lys Gly Ser 

260 265 270 

He Arg His Leu Arg Leu Thr Glu Ala Lys Leu Arg Glu Glu Leu Arg 

275 280 285 

Glu Lys Asp Arg Leu Leu Ala Met Ala Val He Arg Lys Lys His Gly 
290 295 300 

Met 
305 
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<210> 106 
<211> 305 
<212> PRT 

<213>.Mus musculus 



<400> 106 



Met 


Pro 


Gly 


Phe 


Thr 


Cys 


Cys 


Val 


Pro 


Gly 


Cys 


Tyr 


Asn 


Asn 


Ser 


His 


1 








5 










10 










15 




Arg 


Asp 


Lys 


Ala 


Leu 


His 


Phe 


Tyr 


Thr 


Phe 


Pro 


Lys 


Asp 


Ala 


Glu 


Leu 








20 










25 








30 






Arg 


Arg 


Leu 


Trp 


Leu 


Lys 


Asn 


Val 


Ser 


Arg 


Ala 


Gly 


Val 


Ser 


Gly 


Cys 






35 










40 








45 




Phe 


Ser 


Thr 


Phe 


Gin 


Pro 


Thr 


Thr 


Gly 


His 


Arg 


Leu 


Cys 


Ser 


Val 


His 




50 










55 








60 








Phe 


Gin 


Gly Gly 


Arg 


Lys 


Thr 


Tyr 


Thr 


Val 


Arg 


Val 


Pro 


Thr 


He 


Phe 


65 










70 










75 










80 


Pro 


Leu 


Arg Gly 


Val 


Asn 


Glu 


Arg 


Lys 


Val 


Ala 


Arg 


Arg 


Pro 


Ala 


Gly 










85 










90 










95 


Ala 


Ala 


Ala 


Ala 


Arg Arg Arg 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Gin 








100 










105 










110 






Gin 


Gin 


Gin 


Gin 


Gin 


Gin 


Leu 


Gin 


Gin 


Gin 


Gin 


Pro 


Ser 


Pro 


Ser 


Ser 






115 










120 










125 








Ser 


Thr 


Ala 


Gin 


Thr 


Thr 


Gin 


Leu 


Gin 


Pro 


Asn 


Leu 


Val 


Ser 


Ala 


Ser 




130 










135 










140 










Ala 


Ala 


Val 


Leu 


Leu 


Thr 


Leu 


Gin 


Ala 


Ala 


Val 


Asp 


Ser 


Asn 


Gin 


Ala 


145 










IdU 










155 








160 


Pro 


Gly 


Ser 


Val 


Val 


Pro 


Val 


Ser 


Thr 


Thr 


Pro 


Ser 


Gly 


Asp 


Asp 


Val 










165 










170 








175 




Lys 


Pro 


He 


Asp 


Leu 


Thr 


Val 


Gin 


Val 


Glu 


Phe 


Ala 


Ala 


Ala 


Glu 


Gly 








180 










185 










190 




Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ser 


Glu 


Leu 


Glu 


Ala 


Ala 


Thr 


Ala 


Gly 






195 










200 










205 






Leu 


Glu 


Ala 


Ala 


Glu 


Cys 


Thr 


Leu 


Gly 


Pro 


Gin 


Leu 


Val 


Val 


Val 


Gly 




210 










215 










220 








Glu 


Glu 


Gly 


Phe 


Pro 


Asp 


Thr 


Gly 


Ser 


Asp 


His 


Ser 


Tyr 


Ser 


Leu 


Ser 


225 










230 










235 








240 


Ser 


Gly 


Thr 


Thr 


Glu 


Glu 


Glu 


Leu 


Leu 


Arg 


Lys 


Leu 


Asn 


Glu 


Gin 


Arg 










245 










250 










255 


Asp 


He 


Leu 


Ala 


Leu 


Met 


Glu 


Val 






Lys 


KjJL U 


i v ie u 


Lys 


Cjjiy 


Ser 








260 










265 










270 




He 


Arg 


His 


Leu 


Arg 


Leu 


Thr 


Glu 


Ala 


Lys 


Leu 


Arg 


Glu 


Glu 


Leu 


Arg 






275 










280 








285 






Glu 


Lys 


Asp Arg 


Leu 


Leu 


Ala 


Met 


Ala 


Val 


He 


Arg 


Lys 


Lys 


His 


Gly 




290 










295 










300 




Met 
































305 
































<210> 107 




























<211> 652 




























<212> PRT 




























<213> Mus musculus 
























<400> 107 




























Met 


Pro 


Asn 


Phe 


Cys Ala Ala 


Pro 


Asn 


Cys 


Thr 


Arg 


Lys 


Ser 


Thr 


Gin 


1 








5 










10 






15 




Ser Asp 


Leu 


Ala 


Phe 


Phe Arg 


Phe 


Pro 


Arg 


Asp 


Pro 


Ala 


Arg 


Cys 


Gin 








20 










25 










30 




Lys 


Trp 


Val 


Glu 


Asn 


Cys 


Arg 


Arg 


Ala 


Asp 


Leu 


Glu 


Asp 


Lys 


Thr 


Pro 
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35 

1 Asn Lys His 

Ser Met He Cys Arg Thr Ser Pro Tyr Arg Thr Val Leu Arg Asp Asn 
^j- 70 7 ~> 

Ala lie Pro Thr He Phe Asp Leu Thr Ser His Leu Asn Asn Pro His 



40 



45 



Asp Gin Leu Asn Lys His^Tyr Arg Leu Cys Ala^Lys His Phe Glu Thr 

L Leu Arg . 
j. Asn Asn 

85" """" ' 90 95 



Ser Arg His Arg Lys Arg He Lys Glu Leu Ser Glu Asp Glu lie Arg 

i no 105 
Thr Leu Lys Gin Lys Lys He Glu Glu Thr Ser Glu Gin Glu Gin Glu 

us 120 x ^ 

Thr Asn Thr Asn Ala Gin Asn Pro Ser Ala Glu Ala Val Asn Gin Gin 



130 135 "0 



Asp Ala Asn Val Leu Pro Leu Thr Leu Glu Glu Lys Glu Asn Lys Glu 

150 

Tyr Leu Lys Ser Leu Phe Glu He Leu Val Leu Met Gly Lys Gin Asn 

165 110 x 

lie Pro Leu Asp Gly His Glu Ala Asp Glu Val Pro Glu Gly Leu Phe 
180 185 



Ala Pro Asp Asn Phe Gin Ala Leu Leu Glu Cys Arg He Asn Ser Gly 

Glu Glu Val Leu Arg Lys Arg III Glu Ala Thr Ala Val Asn Thr Leu 

215 zzu 
Phe Cys Ser Lys Thr Gin Gin Arg His Met Leu Glu lie Cys Glu Ser 

Cys lie Arg Glu Glu Thr Leu Arg Glu Val Arg Asp Ser His Phe Phe 

245 250 
Ser lie He Thr Asp Asp Val Val Asp He Ala Gly Glu Glu His Leu 

260 265 z /u 

Pro Val Leu Val Arg Phe Val Asp Asp Ala His Asn Leu Arg Glu Glu 



275 280 285 



Phe Val Gly Phe Leu Pro Tyr Glu Ala Asp Ala Glu He Leu Ala Val 

9QO 295 300 

Lys Phe His Thr Thr lie Thr Glu Lys Trp Gly Leu Asn Met Glu Tyr 

Cys Arg Gly Gin Ala Tyr He Val Ser Ser Gly Phe Ser Ser Lys Met 

J 305 330 JO ° 

Lys Val Val Ala Ser Arg Leu Leu Glu Lys Tyr Pro Gin Ala Val Tyr 



Thr Leu Cys Ser Ser Cys Ala Leu Asn Ala Trp Leu Ala Lys Ser Val 

nrc 360 Jb!D 

Pro Val He Gly Val Ser Val Ala Leu Gly Thr He Glu Glu Val Cys 

070 375 
Ser Phe Phe His Arg Ser Pro Gin Leu Leu Leu Glu Leu Asp Ser Val 

385 390 395 

- - phe Gin Asn Ser Glu Glu Arg ™— -j-- 

405 410 

Glu lie Cys His, Ser Gin Trp Thr Gly Arg His Asp Ala Phe Glu He 

4 2 0 4 25 

Leu Val Asp Leu Leu Gin Ala Leu Val Leu Cys Leu Asp Gly He He 

Asn Ser A^p Thr Asn Val Arg Trp Asn Asn Tyr He Ala Gly Arg Ala 



lie Ser Val Leu Phe Gin Asn Ser Glu Glu Arg Ala Lys Glu Leu Lys 



Phe Val Leu Cys Ser Ala Val Thr Asp Phe Asp Phe He Val Thr lie 



455 460 



Val Val Leu Lys Asn Val Leu Ser Phe Thr Arg Ala Phe Gly Lys Asn 



470 



475 480 



485 



490 



Leu Gin Gly Gin Thr Ser Asp Val Phe Phe Ala Ala Ser Ser Leu Thr 

500 505 
Ala Val Leu His Ser Leu Asn Glu Val Met Glu Asn He Glu Val Tyr 

515 520 
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His Glu Phe Trp Phe Glu Glu Ala Thr Asn Leu Ala Thr Lys Leu Asp 
530 535 540 



-I- T 

lie 


Gin 


Met 


Lys 


Leu 


Pro 


Gly 


Lys 


Phe 


Arg 


545 










550 








Leu 


Glu 


Ser 


Gin 


Leu 
565 


Thr 


Ser 


Glu 


Ser 


Tyr 
570 


Val 


Pro 


Thr 


Val 


Glu 


His 


He 


He 


Gin 


Glu 








c a 

580 










585 




oxn 


xii S 


Leu 


Lys 


Ala 


Leu 


Lys 


Cys 


Leu 






595 










600 




Met 


Gly 


Gin 


Leu 


Lys 


Phe 


Asn 


Thr 


Ser 


Glu 




610 










615 






Tyr 


Arg 


Ser 


Asp 


Leu 


Pro 


Asn 


Pro 


Asp 


Thr 


625 










630 








Cys 


Trp 


Arg 


He 


Lys 


Trp 


Lys 


His 


Arg Gly 










645 










650 



555 560 
Tyr Lys Asp Thr Leu Ser 
575 

Leu Lys Asp He Phe Ser 
590 

Ser Leu Val Pro Ser Val 
605 

Glu His His Ala Asp Met 
620 

Leu Ser Ala Glu Leu His 
635 640 



<210> 108 
<211> 180 
<212> PRT 

<213> Rattus norvegicus 
<220> 

<223> RAT THAP 

<221> UNSURE 
<222> 95 

<223> Xaa = any of the twenty amino acids 
<400> 108 

Arg Gin Cys Cys Asn Arg Tyr Ser Ser Arg Arg Lys Gin Leu Thr Phe 

1 5 10 15 

His Arg Phe Pro Phe Ser Arg Pro Glu Leu Leu Arg Glu Trp Val Leu 

20 25 30 

Asn He Gly Arg Ala Asp Phe Lys Pro Lys Gin His Thr Val He Cys 

35 40 45 

Ser Glu His Phe Arg Pro Glu Cys Phe Ser Ala Phe Gly Asn Arg Lys 
50 55 60 

Asn Leu Lys His Asn Ala Val Pro Thr Val Phe Ala Phe Gin Asn Pro 
63 70 75 80 

Ala Gin Val Cys Pro Glu Val Gly Ala Gly Gly Asp Ser Ser Xaa Arg 

85 90 95 

Asn Met Asp Ala Thr Leu Glu Glu Leu Gin Ser Pro Asn Thr Glu Glv 

100 105 no 

Pro Met Gin Gin Val Leu Pro Asp Arg Gin Ala Thr Glu Ala Met Glu 

115 120 125 

Ala Ala Gly Leu Pro Ala Gly Pro Leu Gly Leu Lys Arg Pro Leu Pro 
130 135 14Q 

Gly Gin Pro Ser Asp His Ser Tyr Ala Leu Leu Asp Leu Asp Thr Leu 
1 150 155 ~ 160 

Lys Lys Lys Leu Phe Leu Thr Leu Lys Glu Asn Lys Arg Leu Arq Lvs 
165 170 "* 17 5 ^ 

Arg Leu Lys Ala 
180 



<210> 109 
<211> 82 
<212> PRT 

<213> Rattus norvegicus 
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MerTalTys Cys Cys Ser Ala He Gly Cys Ala Ser Arg Cys Leu Pro 
Asn Ser Ly S Leu Lys Gly Leu Thr Phe His Val Phe Pro Thr Asp Glu 
Asn lie Lys Arg Lys Trp Val Leu Ala Met Lys Arg Leu Asp Val Asn 



Thr Ala Gly He Trp Glu Pro Ser Leu Gin Pro Glu Ser Phe Tyr Phe 



55 



60 



lie Phe Met Glu Asn Leu Phe Phe He Leu Pro Pro Gin Leu Ser His 
65 

Ala Val 



65 



<210> 110 
<211> 309 
<212> PRT 

<213> Rattus norvegicus 



Mefpro^rg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 

Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 

20 2 5 ' } 

Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn Cys Gin Arg Leu Asp Pro 

Ser Gly Gin Gly Leu Trp Asp Pro Thr Ser Glu Tyr He Tyr Phe Cys 



55 6° 
Ser Lys His Phe Glu Glu Asn Cys Phe Glu Leu Val Gly He Ser Gly 
ac . 70 

Tyr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 

85 9t) 
Ser Lys Leu Arg Arg Thr Ala Lys Thr Lys Val His Gly Tyr Pro Pro 

100 105 ±XKJ 

Gly Leu Pro Asp Val Ser Arg Leu Arg Arg Cys Arg Lys Arg Cys Ser 

11S 120 lzb 

Glu Arg Gin Gly Pro Thr lie Pro Phe Ser Pro Pro Pro Arg Ala Asp 

lie He Arg Phe Pro Val Glu Glu Ala Ser Ala Pro Ala Thr Leu Pro 

1 /iq 150 I 5 * 3 

Ala ser Pro Ala Ala Arg Leu Asp Pro Gly Leu Asn Ser Pro Phe Ser 

165 170 
Asp Leu Leu Gly Pro Leu Gly Ala Gin Ala Asp Glu Ala Gly Cys Ser 

180 1-85 ± ^ 9KJ 

Ala Gin Pro Ser Pro Glu Gin His Pro Ser Pro Leu Glu Pro Gin Hrs 

Val Ser Pro Ser Thr Tyr Met Leu Arg Leu Pro Pro Pro Ala Gly Ala 

215 

Tyr He Gin Asn Glu His Ser Tyr Gin Val Gly Ser Ala Leu Leu Trp 

^ 230 2.3d ^ 

Lys Arg Arg Ala Glu Ala Ala Leu Asp Ala Leu Asp Lys Thr Gin Arg 

~ 245 250 

Gin Leu Gin Ala Cys Lys Arg Arg Glu Gin Arg Leu Arg Leu Arg Leu 
260 

Thr Lys Leu Gin Gin Glu Arg Ala Arg Glu Lys Arg Ala Gin Ala Asp 

_ - 280 ^00 

Ala Arg Gin Thr Leu Lys Asp His Val Gin Asp Phe Ala Met Gin Leu 

290 295 300 

Ser Ser Ser Met Ala 
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305 



<210> 111 
<211> 142 
<212> PRT 

<213> Rattus norvegicus 
<400> 111 

Met Pro Asn Phe Cys Ala Ala Pro Asn Cys Thr Arg Lys Ser Thr Gin 
Ser Asp Leu Ala Phe Phe Arg Phe Pro Arg Asp Pro Ala Arg Cys Gin 



Lys Trp Val Glu Asn Cys Arg Arg Ala Asp Leu Glu Asp Lys Thr Pro 

Asp Gin Leu Asn Lys His Tyr Arg Leu Cys Ala Lys Ms Phe Glu Thr 

Ser Met He Cys Arg Thr Ser Pro Tyr Arg Thr Val Leu Arg Asp Asn 

Ala He Pro Thr lie Phe Asp Leu Thr Ser Its Leu Asn Asn Pro His 

Ser Arg His Arg Lys Arg He Lys Glu Leu Ser Glu Asp Glu lie Arg 

Thr Leu Lys Gin Lys Lys He Glu Glu Thr Ser Glu Gin Glu Gin Gly 

120 loc 
Thr Asn Ser Asn Ala Gin Tyr Pro Ser Ala Glu Val Gly Asn 



135 



140 



<210> 112 
<211> 104 
<212> PRT 
<213> Sus scrofa 

<400> 112 

Met Val Lys Cys Cys Ser Ala He Gly Cys Ala. Ser Arg Cys Leu Pro 



Asn Ser Lys Leu Lys Gly Leu Thr Phe His Val Phe Pro Thr Asp Glu 

Lys Val Lys Arg Lys Trp Val Leu Ala Met Lys Arg Leu Asp Val Asn 

Ala Ala Gly Met Trp Glu Pro Lys Lys Gly Asp Val Leu Cys Ser Arg 

His Phe Lys Lys Thr Asp Phe Asp Arg Thr Thr Pro Asn He Lys Leu 

Lys Pro Gly Val lie Pro Ser He Phe Asp Ser Pro Ser His Leu tL 

90 q 
Gly Glu Glu Arg Lys Ala Pro Leu " 95 



<210> 113 

<211> 235 

<212> PRT 

<213> Sus scrofa 

<220> 

<221> UNSURE 
<222> 57, 124, 192 
<223> Xaa - any of the 



twenty amino acids 
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5l? 0 5r"Lg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 
1 5 10 

Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 

Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn Cys Gin Arg Leu Asp Pro 

Ser Gly Gin Gly Leu Trp Asp Pro Xaa Ser Glu Tyr He Tyr Phe Cys 

SO 55 60 

Ser Lys His Phe Glu Glu Asn Cys Phe Glu Leu Val Gly He Ser Gly 

?yr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 

85 9° yD 

Ser Lys Leu Arg Arg Thr Ala Lys Thr Lys Gly His Ser Tyr Pro Pro 

100 105 

Gly Pro Pro Asp Val Ser Arg Leu Arg Arg Cys Xaa Lys Arg Cys Ser 

115 " 120 125 

Glu Gly Arg Gly Pro Thr Thr Pro Phe Ser Pro Pro Pro Pro Ala Asp 

130 135 140 

Val Thr Cys Phe Pro Val Glu Glu Ala Ser Ala Pro Ala Ala Leu Ser 
« j r 150 155 

Ala Ser Pro Thr Gly Arg Leu Glu Pro Gly Leu Ser Ser Pro Phe Ser 

165 170 
Asp Leu Leu Gly Pro Leu Gly Ala Gin Ala Asp Glu Ala Gly Cys Xaa 

180 185 
Thr Gin Pro Ser Pro Glu Arg Glu Pro Glu Arg Gin Pro Ser Pro Leu 

195 200 205 

Glu Pro Arg Pro Val Ser Pro Ser Ala Tyr Met Leu Arg Leu Pro Pro 

210 " 215 220 

Pro Ala Gly Ala Tyr He Gin Asn Glu His Ser 
925 " 230 235 



<210> 114 
<211> 149 
<212> PRT 
<213> Sus scrofa 

Me^Th^Arg Ser Cys Ser Ala Val Gly Cys Ser Thr Arg Asp Thr Val 

i 5 10 

Leu Ser Arg Glu Arg Gly Leu Ser Phe His Gin Phe Pro Thr Asp Thr 

20 25 30 

He Gin Arg Ser Gin Trp He Arg Ala Val Asn Arg Met Asp Pro Arg 

35 40 45 

Ser Lys Lys He Trp He Pro Gly Pro Gly Ala Met Leu Cys Ser Lys 

50 55 60 

His Phe Gin Glu Ser Asp Phe Glu Ser Tyr Gly He Arg Arg Lys Leu 
65 70 ^ 5 

Lys Lys Gly Ala Val Pro Ser Val Ser Leu Tyr Lys Val Leu Gin Gly 

85 90 y 

Ala His Leu Lys Gly Lys Ala Arg Gin Lys He Leu Lys Gin Pro Leu 

100 105 110 

Pro Asp Asn Ser Gin Glu Val Ala Thr Glu Asp His Asn Tyr Ser Leu 

115 120 125 

Lvs Gly Pro Leu Thr He Gly Ala Glu Lys Leu Ala Glu Val Gin Gin 

130 ' 135 140 

Met Leu Gin Val Ser 
145 
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<210> 115 

<211> 43 
<212> PRT 
<213> Mus musculus 



<400> 115 

Val Leu Glu Asp Val Ala Ala Ala Glu Gin Gly Leu Arg Glu Leu Gin 

5 10 15 

Arg Gly Arg Arg Gin Cys Arg Glu Arg Val Cys Ala Leu Arg Ala Ala 

20 25 30 

Ala Glu Gin Arg Glu Ala Arg Cys Arg Asp Gly 
35 4 0 



<210> 116 
<211> 45 
<212> PRT 
<213> Mus 



musculus 



<400> 116 

Gin Leu Glu Gin Gin Val Glu Lys Leu Arg Lys Lys Leu Lys Thr Ala 

5 10 15 

Gin Gin Arg Cys Arg Arg Gin Glu Arg Gin Leu Glu Lys Leu Lys Glu 

20 25 30 

Val Val His Phe Gin Arg Glu Lys Asp Asp Ala Ser Glu 
35 40 1 45 



<210> 117 

<211> 45 

<212> PRT 

<213> Homo sapiens 



<400> 117 

Gin Leu Glu Gin Gin Val Glu Lys Leu Arg Lys Lys Leu Lys Thr Ala 

5 10 15 

Gin Gin Arg Cys Arg Arg Gin Glu Arg Gin Leu Glu Lys Leu Lys Glu 

20 25 30 

Val Val His Phe Gin Lys Glu Lys Asp Asp Val Ser Glu 
35 40 4 5 



<210> 118 
<211> 342 
<212> PRT 
<213> Homo sapiens 



<400> 118 



Met 
1 


Ala 


Thr Gly 


Gly 
5 


Tyr Arg 


Thr 


Thr 


Asp 


Phe Leu 


Glu 


Glu 


Trp 


Lys 






20 






Lys 


Gin 


Asn Pro 


Pro 


Gly 


Pro 


Ala 






35 








40 


Ala 


Ala 


Gly Lys 


Pro 


Pro 


Ala 


Gly 




50 








55 


Ala 


Ala 


Asn Glu 


Leu 


Asn 


Asn 


Asn 


65 








70 




Pro 


Ala 


Val Pro 


Gly 


Pro 


Gly Gly 








85 








Met 


Leu 


Thr Arg 


Ala 


Pro 


Pro 


Ala 



Ser Ser Gly Leu Gly Gly Ser Thr 

10 * 15 

Ala Lys Arg Glu Lys Met Arg Ala 
25 30 
Pro Pro Gly Gly Gly Ser Ser Asp 
45 

Ala Leu Gly Thr Pro Ala Ala Ala 
60 

Leu Pro Gly Gly Ala Pro Ala Ala 

75 80 
Val Asn Cys Ala Val Gly Ser Ala 

90 " 95 

Arg Gly Pro Arg Arg Ser Glu Asp 
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100 105 HO 

Glu Pro Pro Ala Ala Ser Ala Ser Ala Ala Pro Pro Pro Gin Arg Asp 
115 120 125 

Glu Glu Glu Pro Asp Gly Val Pro Glu Lys Gly Lys Ser Ser Gly Pro 

130 135 140 

Ser Ala Arg Lys Gly Lys Gly Gin lie Glu Lys Arg Lys Leu Arg Glu 
145 ^ 150 155 160 

Lvs Arg Arg Ser Thr Gly Val Val Asn He Pro Ala Ala Glu Cys Leu 

165 170 175 

Asp Glu Tyr Glu Asp Asp Glu Ala Gly Gin Lys Glu Arg Lys Arg Glu 

180 185 190 

Asp Ala He Thr Gin Gin Asn Thr He Gin Asn Glu Ala Val Asn Leu 

195 200 205 

Leu Asp Pro Gly Ser Ser Tyr Leu Leu Gin Glu Pro Pro Arg Thr Val 

210 215 220 

Ser Gly Arg Tyr Lys Ser Thr Thr Ser Val Ser Glu Glu Asp Val Ser 
925 230 235 240 

Ser Arg Tyr Ser Arg Thr Asp Arg Ser Gly Phe Pro Arg Tyr Asn Arg 

245 250 255 

Aso Ala Asn Val Ser Gly Thr Leu Val Ser Ser Ser Thr Leu Glu Lys 

260 ^ 265 270 

Lys He Glu Asp Leu Glu Lys Glu Val Val Thr Glu Arg Gin Glu Asn 

275 280 285 

Leu Arg Leu Val Arg Leu Met Gin Asp Lys Glu Glu Met He Gly Lys 

290 295 300 

Leu Lvs Glu Glu He Asp Leu Leu Asn Arg Asp Leu Asp Asp He Glu 
305 310 315 320 

Asp Glu Asn Glu Gin Leu Lys Gin Glu Asn Lys Thr Leu Leu Lys Val 

325 330 335 

Val Gly Gin Leu Thr Arg 
340 



<210> 119 
<211> 134 
<212> PRT 

<213> Homo sapiens 

<400> 119 n ^ 

Met Ala Gin Ser Leu Ala Leu Ser Leu Leu He Leu Val Leu Ala Phe 

1 5 10 15 

Gly He Pro Arg Thr Gin Gly Ser Asp Gly Gly Ala Gin Asp Cys Cys 

20 25 30 

Leu Lys Tyr. Ser Gin Arg Lys He Pro Ala Lys Val Val Arg Ser Tyr 

35 40 45 

Arg Lys Gin Glu Pro Ser Leu Gly Cys Ser He Pro Ala He Leu Phe 

50 55 60 

Leu Pro Arg Lys Arg Ser Gin Ala Glu Leu Cys Ala Asp Pro Lys Glu 
65 70 75 80 

Leu Trp Val Gin Gin Leu Met Gin His Leu Asp Lys Thr Pro Ser Pro 

85 90 95 

Gin Lys Pro Ala Gin Gly Cys Arg Lys Asp Arg Gly Ala Ser Lys Thr 

100 "* 105 110 

Glv Lys Lys Gly Lys Gly Ser Lys Gly Cys Lys Arg Thr Glu Arg Ser 

115 120 125 

Gin Thr Pro Lys Gly Pro 
130 



<210> 120 
<211> 766 
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<212> PRT 

<213> Drosophila melanogaster 
<400> 120 

Met Lys Tyr Cys Lys Phe Cys Cys Lys Ala Val Thr Gly Val Lys Leu 

10 

He His Val Pro Lys Cys Ala He Lys Arg Lys Leu Trp Glu Gin Ser 

Leu Gly Cys Ser Leu Gly Glu Asn Ser Gin He Cys Asp Thr His Phe 

Asn Asp Ser Gin Trp Lys Ala Ma Pro Ala Lys Gly Gin Thr Phe Lys 

55 g0 
Arg Arg Arg Leu Asn Ala Asp Ala Val Pro Ser Lys Val He Glu Pro 

Glu Pro Glu Lys lie Lys Glu Gly Tyr Thr Ser Gly Ser Thr Gin tL 

5 90 or 

Glu Ser Cys Ser Leu Phe Asn Glu Asn Lys Ser Leu Arg Glu Lys He 

105 ii a 

Arg Thr Leu Glu Tyr Glu Met Arg Arg Leu Glu Gin Gin Leu Arg Glu 

120 125 
Ser Gin Gin Leu Glu Glu Ser Leu Arg Lys He Phe Thr Asp Thr Gin 

135 140 
lie Arg He Leu Lys Asn Gly Gly Gin Arg Ala Thr Phe Asn Ser Asp 

155 i c^c\ 

Asp He Ser Thr Ala He Cys Leu His Thr Ala Gly Pro Arg Ala Tyr 

5 170 17 c 

Asn Hxs Leu Tyr Lys Lys Gly Phe Pro Leu Pro Ser Arg Thr Thr Leu 

0 185 ion 

Tyr Arg Trp Leu Ser Asp Val Asp H e Lys Arg Gly Cys Leu Asp Val 

Val lie Asp Leu Met Asp Ser Asp Gly Val Asp Asp Ira Asp Lys Leu 

^15 220 
Cys Val Leu Ala Phe Asp Glu Met Lys Val Ala Ala Ala Phe Glu Tyr 

230 235 r>Ac\ 

Asp ser Ser Ala Asp He Val Tyr Glu Pro Ser Asp Tyr Val Gin Leu 

^45 250 o;c 

Ala lie Val Arg Gly Leu Lys Lys Ser Trp Lys Gin Pro Val Phe Phe 

265 ?7H 
Asp Phe Asn Thr Arg Met Asp Pro Asp Thr Leu Asn Asn lie Leu Arg 

280 285 
Lys Leu His Arg Lys Gly Tyr Leu Val Val Ala He Val Ser Asp Leu 

Gly Thr Gly Asn Gin Lys Leu Trp Thr Glu Leu lly He Ser Glu Ser 

Lys Thr Trp Phe Ser His 
325 

Phe Ser Asp Thr Pro His 

340 345 " asn 

Asp Ser Gly Leu Thr He Asn Gly Lys Lys Leu Thr Lys Lys Thr He 

360 355 
Gin Glu Ala Leu His Leu Cys Asn Lys Ser Asp Leu Ser He Leu Phe 

375 380 
Lys He Asn Glu Asn His He Asn Val Arg Ser Leu Ala Lys Gin Lys 

390 395 /inn 

Val Lys Leu Ala Thr Gin Leu Phe Ser Asn Thr Thr Ala Ser Ser lie 

405 410 41 c 

Arg Arg Cys Tyr Ser Leu Gly Tyr Asp He Glu Asn Ala Thr Glu Thr 

425 430 
Ala Asp Phe Phe Lys Leu Met Asn Asp Trp Phe Asp He Phe Asn Ser 

Lys Leu Ser Thr Ser Asn Cys He Glu Cys Ser Gin Pro Tyr Gly L ys 

54/95 



315 



Lys Thr Trp Phe Ser His Pro Ala Asp Asp His Leu Lys He Phe Var 



330 



Phe Ser Asp Thr Pro His Leu He Lys Leu Val Arg Asn His xyr Val 
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455 460 
Gin Leu Asp He Gin Asn Asp He Leu Asn Arg Met Ser Glu He Met 

a c c; 470 

Arg Thr Gly He Leu Asp Lys Pro Lys Arg Leu Pro Phe Gin Lys Gly 

485 490 
lie lie Val Asn Asn Ala Ser Leu Asp Gly Leu Tyr Lys Tyr Leu Gin 

500 505 o-lu 

Glu Asn Phe Ser Met Gin Tyr lie Leu Thr Ser Arg Leu Asn Gin Asp 

lie Val Glu His Phe Phe Gly III Met Arg Ser Arg Gly Gly Gin Phe 

Asp His Pro Thr Pro Leu Gin Phe Lys Tyr Arg Leu Arg Lys Tyr He 
c 4 c 550 555 

III Ala Arg Asn Thr Glu Met Leu Arg Asn Ser Gly Asn He Glu Glu 

565 570 
Gly Met Thr Asn Leu Lys Glu Cys Val Asn Lys Asn Val He Pro Asp 

580 585 d^u 

Asn Ser Glu Ser Trp Leu Asn Leu Asp Phe Ser Ser Lys Glu Asn Glu 

5Q5 600 605 

Asn Lys Ser Lys Asp Asp Glu Pro Val Asp Asp Glu Pro Val Asp Glu 

6nn 615 620 

Met Leu Ser Asn He Asp Phe Thr Glu Met Asp Glu Leu Thr Glu Asp 

630 635 
Ala Met Glu Tyr He Ala Gly Tyr Val He Lys Lys Leu Arg He Ser 

645 650 
Asp Lys Val Lys Glu Asn Leu Thr Phe Thr Tyr Val Asp Glu Val Ser 

His Gly Gly Leu He Lys Pro Ser Glu Lys Phe Gin Glu Lys Leu Lys 

675 bab 

Glu Leu Glu Cys He Phe Leu His Tyr Thr Asn Asn Asn Asn Phe Glu 

690 695 700 

ls P Val Asp Lys Gin Val Lys Ser Phe Tyr Phe Lys He Arg He Tyr 

Phe Arg He Lys Tyr Phe Asn Lys Lys He Glu He Lys Asn Gin Lys 

740 745 75U 

Gin Lys Leu He Gly Asn Ser Lys Leu Leu Lys He Lys Leu 

760 ' b ^ 



He Thr Asn Asn Val Lys Glu Lys Leu He Leu Ala Ala Arg Asn Val 



710 " 715 



755 



<210> 121 
<211> 103 
<212> PRT 
<213> Homo sapiens 

Asp°Glu 2 Lu Cys Val Val Cys Gly Asp Lys Ala Thr Gly Tyr His Tyr 

1 5 10 

Arg Cys lie Thr Cys Glu Gly Cys Lys Gly Phe Phe Arg Arg Thr He 

Gin Lys Asn Leu His Pro Ser Tyr Ser Cys Lys Tyr Glu Gly Lys Cys 

Val lie Asp Lys Val Thr Arg Asn Gin Cys Gin Glu Cys Arg Phe Lys 

cn ~ 55 60 

Lys Cys lie Tyr Val Gly Met Ala Thr Asp Leu Val Leu Asp Asp Ser 

Lys Arg Leu Ala Lys Arg Lys Leu lie Glu Glu Asn Arg Glu Lys Arg 

85 90 
Arg Arg Glu Glu Leu Glu Lys 
100 
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<210> 122 

<211> 81 

<212> PRT 

<213> Homo sapiens 

<400> 122 

Met Lys Pro Ala Arg Pro Cys Leu Val Cys Ser Asp Glu Ala Ser Gly 

1 5 10 15 

Cys His Tyr Gly Val Leu Thr Cys Gly Ser Cys Lys Val Phe Phe Lys 

20 25 ^ 30 

Arg Ala Val Glu Gly Gin His Asn Tyr Leu Cys Ala Gly Arg Asn Asp 

35 40 45 

Cys He He Asp Lys He Arg Arg Lys Asn Cys Pro Ala Cys Arg Tyr 

50 55 60 

Arg Lys Cys Leu Gin Ala Gly Met Asn Leu Glu Ala Arg Lys Thr Lvs 

65 70 75 80 

Lys 



<210> 123 

<211> 89 

<212> PRT 

<213> Homo sapiens 



<400> 123 



Met 
1 


Val 


Gin 


Ser 


Cys 
5 


Ser 


Ala 


Tyr 


Asp 


Lys 


Pro 


Val 


Ser 


Phe 


His 


Lys 








20 








Cys 


Lys 


Glu 


Trp 


Glu 


Ala 


Ala 


Val 






35 










40 


Lys 


Tyr 


Ser 


Ser 


He 


Cys 


Ser 


Glu 




50 








55 




Arg 


Glu 


Cys Asn 


Asn 


Lys 


Leu 


Leu 


65 










70 






Phe 


Leu 


Cys 


Thr 


Glu 


Pro 


His 


Asp 



85 



Gly Cys Lys Asn Arg Tyr Asp Lys 

10 15 
Phe Pro Leu Thr Arg Pro Ser Leu 
25 30 
Arg Arg Lys Asn Phe Lys Pro Thr 
4 5 

His Phe Thr Pro Asp Cys Phe Lys 
60 

Lys Glu Asn Ala Val Pro Thr lie 
75 80 

Lys 



<210> 124 
<211> 85 
<212> PRT 

<213> Drosophila melanogaster 
<400> 124 



Met 
1 


Lys Tyr 


Cys 


Lys 
5 


Phe Cys Cys 


He 


His Val 


Pro 


Lys 


Cys Ala He 






20 






Leu 


Gly Cys 


Ser 


Leu 


Gly Glu Asn 




35 






40 


Asn 


Asp Ser 


Gin 


Trp 


Lys Ala Ala 




50 




55 


Arg 


Arg Arg 


Leu 


Asn 


Ala Asp Ala 


65 








70 


Glu 


Pro Glu 


Lys 


He 





85 



Lys 


Ala 


Val 


Thr 


Gly Val 


Lys 


Leu 




10 










15 




Lys Arg 


Lys 


Leu 


Trp 


Glu 


Gin 


Ser 


25 








30 






Ser 


Gin 


lie 


Cys 


Asp 
45 


Thr 


His 


Phe 


Pro 


Ala 


Lys 


Gly Gin 


Thr 


Phe 


Lys 








60 








Val 


Pro 


Ser 
75 


Lys 


Val 


He 


Glu 


Pro 
80 
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<210> 125 
<211> 58 
<212> PRT 

<213> Artificial Sequence • 
<220> 

<223> THAP Domain consensus 

f 2 S T-Tl 9, 13-17, 19, 21-23, 25-26, 28, 35, 38-39, 41, 45-50, 

52, 55-56 . 
<223> Xaa = any of the twenty amino acids 

Se? 0 Jal 2 Xaa Xaa Cys Ser Xaa Tyr Xaa Cys Lys Asn Xaa Xaa Xaa Xaa 

1 5 10 

Xaa Lys Xaa Val Xaa Xaa Xaa Lys Xaa Xaa Leu Xaa Arg Pro Ser Leu 

20 25 
Cys Lys Xaa Trp Glu Xaa Xaa Val Xaa Arg Lys Asn Xaa Xaa Xaa Xaa 

35 40 
Xaa Xaa Ser Xaa He Cys Xaa Xaa His Phe 
50 55 



<210> 126 
<211> 89 
<212> PRT 

<213> Homo sapiens 

Se?°vai 2 Gln Ser Cys Ser Ala Tyr Gly Cys Lys Asn Arg Tyr Asp Lys 

aL Lys Pro Val Ser Phe His Lys Phe Pro Leu Thr Arg Pro Ser Leu 

20 25 
Cys Lys Glu Trp Glu Ala Ala Val Arg Arg Lys Asn Phe Lys Pro Thr 

Lvs Tyr Ser Ser He Cys Ser Glu His Phe Thr Pro Asp Cys Phe Lys 

50 55 60 

Arg Glu Cys Asn Asn Lys Leu Leu Lys Glu Asn Ala Val Pro Thr lie 

65 70 75 

Phe Leu Cys Thr Glu Pro His Asp Lys 
.85 



<210> 127 
<211> 89 
<212> PRT 

<213> Homo sapiens 

Se?°Pro 2 Iys Ser Cys Ala Ala Arg Gin Cys Cys Asn Arg Tyr Ser Ser 

1 5 10 15 

Arg Arg Lys Gin Leu Thr Phe His Arg Phe Pro Phe Ser Arg Pro Glu 

20 25 30 

Leu Leu Lys Glu Trp Val Leu Asn He Gly Arg Gly Asn Phe Lys Pro 

35 40 45 

Lys Gin His Thr Val He Cys Ser Glu His Phe Arg Pro Glu Cys Phe 

50 55 60 

Ser Ala Phe Gly Asn Arg Lys Asn Leu Lys His Asn Ala Val Pro Thr 

65 70 75 BU 
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Val Phe Ala Phe Gin Asp Pro Thr Gin 
85 



<210> 128 

<211> 90 

<212> PRT 

<213> Homo sapiens 

<400> 128 



Met 


Pro 


Arg 


Tyr 


Cys Ala Ala 


He 


Cys 


Cys 


Lys Asn Arg Arg Gly Arg 


1 








5 










10 










15 


Asn 


Asn 


Lys 


Asp 


Arg 


Lys 


Leu 


Ser 


Phe 


Tyr 


Pro 


Phe 


Pro 


Leu 


His Asp 








20 










25 










30 




Lys 


Glu 


Arg 


Leu 


Glu 


Lys 


Trp 


Leu 


Lys 


Asn 


Met 


Lys 


Arg 


Asp 


Ser Trp 






35 










40 










45 






Val 


Pro 


Ser 


Lys 


Tyr 


Gin 


Phe 


Leu 


Cys 


Ser 


Asp 


His 


Phe 


Thr 


Pro Asp 




50 










55 










60 








Ser 


Leu 


Asp 


He 


Arg 


Trp 


Gly 


He 


Arg 


Tyr 


Leu 


Lys 


Gin 


Thr 


Ala Val 


65 










70 










75 








80 


Pro 


Thr 


He 


Phe 


Ser 


Leu 


Pro 


Glu Asp Asn 













85 90 



<210> 129 

<211> 92 

<212> PRT 

<213> Homo sapiens 

<400> 129 



Met 


Pro 


Lys 


Tyr 


Cys 


Arg Ala 


Pro 


Asn 


Cys 


Ser 


Asn 


Thr 


Ala 


Gly Arg 


1 








5 








10 










15 


Leu 


Gly 


Ala 


Asp 


Asn Arg Pro 


Val 


Ser 


Phe 


Tyr 


Lys 


Phe 


Pro 


Leu Lys 








20 








25 










30 




Asp 


Gly 


Pro 


Arg 


Leu 


Gin Ala 


Trp 


Leu 


Gin 


His 


Met 


Gly 


Cys 


Glu His 






35 








40 










45 






Trp 


Val 


Pro 


Ser 


Cys 


His Gin 


His 


Leu 


Cys 


Ser 


Glu 


His 


Phe 


Thr Pro 




50 








55 










60 








Ser 


Cys 


Phe 


Gin 


Trp 


Arg Trp 


Gly Val 


Arg 


Tyr 


Leu 


Arg 


Pro Asp Ala 


65 










70 








75 








80 


Val 


Pro 


Ser 


He 


Phe 


Ser Arg 


Gly 


Pro 


Pro 


Ala 


Lys 









85 90 



<210> 130 

<211> 90 

<212> PRT 

<213> Homo sapiens 



<400> 130 



Met 


Val 


He 


Cys 


Cys 


Ala Ala 


Val 


Asn Cys 


Ser Asn Arg 


Gin 


Gly Lys 


1 








5 






10 








15 




Gly 


Glu 


Lys 


Arg 


Ala 


Val Ser 


Phe 


His Arg 


Phe Pro 


Leu 


Lys 


Asp 


Ser 








20 








25 






30 




Lys 


Arg 


Leu 


He 


Gin 


Trp .Leu 


Lys Ala Val 


Gin Arg 


Asp 


Asn 


Trp 


Thr 






35 








40 






45 






Pro 


Thr 


Lys 


Tyr 


Ser 


Phe Leu 


Cys 


Ser Glu 


His Phe 


Thr 


Lys 


Asp 


Ser 




50 








55 






60 






Phe 


Ser 


Lys 


Arg 


Leu 


Glu Asp 


Gin 


His Arg Leu Leu 


Lys 


Pro 


Thr 


Ala 


65 










70 






75 






80 


Val 


Pro 


Ser 


He 


Phe 


His Leu 


Thr 


Glu Lys 
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85 



<210> 131 
<211> 89 
<212> PRT 

<213> Homo sapiens 
<400> 131 



90 



Met 0> Pro Thr Asn Cys Ala Ala Ala Gly Cys Ala Thr Thr Tyr Asn Lys 



As lie Asn lie Ser Phe His Arg Phe Pro Leu Asp Pro Lys Arg Arg 

Lys Glu Trp Val Arg Leu Val Arg Arg Lys Asn Phe Val Pro Gly Lys 

4 0 "' 



His Thr Phe Leu Cys Ser Lys Hxs Phe Glu Ala Ser Cys Phe Asp Leu 



55 60 
Thr Gxy Gin Thr Arg Arg Leu Lys Met Asp Ala Val Pro Thr He Phe 

65 70 
Asp Phe Cys Thr His He Lys Ser Met 
85 



<210> 132 
<211> 90 
<212> PRT 

<213> Homo sapiens 



Me 0 t°Pro 3 Asn Phe Cys Ala Ala Pro Asn Cys Thr Arg Lys Ser Thr Gin 
Ser Asp Leu Ala Phe Phe Arg Phe Pro Arg Asp Pro Ala Arg Cys Gin 
Lys Trp Val Glu Asn Cys Arg Arg Ala Asp Leu Glu Asp Lys Thr Pro 
Asp Gin Leu Asn Lys His Tyr Arg Leu Cys Ala Lys His Phe Glu Thr 

Ser Met He Cys Arg Thr Ser Pro Tyr Arg Thr Val Leu Arg Asp Asn 

65 70 

Ala He Pro Thr He Phe Asp Leu Thr Ser 



85 



<210> 133 
<211> 97 
<212> PRT 

<213> Homo sapiens 
<400> 133 



Met Pro Arg His Cys Ser Ala Ala Gly Cys Cys Thr Arg Asp Thr Arg 



5 10 



Glu Thr Arg Asn Arg Gly He Ser Phe His Arg Leu Pro Lys Lys Asp 
20 25 



Asn Pro Arg Arg Gly Leu Trp Leu Ala Asn Cys Gin Arg Leu Asp Pro 

oc 40 ^ 

Ser Gly Gin Gly Leu Trp Asp Pro Ala Ser Glu Tyr lie Tyr Phe Cys 



Ser Lys His Phe. Glu Glu Asp Cys Phe Glu Leu Val Gly He Ser Gly 

Tyr His Arg Leu Lys Glu Gly Ala Val Pro Thr He Phe Glu Ser Phe 
y ' 85 90 
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Ser 



<210> 134 

<211> 92 

<212> PRT 

<213> Homo sapiens 



<400> 134 



Met 


Thr 


Arg Ser 


Cys 


Ser 


Ala 


Val 


1 






5 








Leu 


Ser 


Arg Glu 
20 


Arg 


Gly 


Leu 


Ser 


He 


Gin 


Arg Ser 
35 


Lys 


Trp 


He 


Arg 
40 


Ser 


Lys 


Lys He 


Trp 


He 


Pro 


Gly 




50 








55 


His 


Phe 


Gin Glu 


Ser Asp 


Phe 


Glu 


65 








70 






Lys 


Lys 


Gly Ala 


Val 
85 


Pro 


Ser 


Val 



Gly Cys Ser Thr Arg Asp Thr Val 

10 15 
Phe His Gin Phe Pro Thr Asp Thr 
25 30 
Ala Val Asn Arg Val Asp Pro Arg 
4 5 

Pro Gly Ala He Leu Cys Ser Lys 
60 

Ser Tyr Gly He Arg Arg Lys Leu 

75 80 
Ser Leu Tyr Lys 
90 



<210> 135 
<211> 96 
<212> PRT 

<213> Homo sappiens 



10 15 
His Val Phe Pro Thr Asp Glx 

' 30 

Met Lys Arg Leu Asp Val Asi 
45 

Gly Asp Val Leu Cys Ser Arc 
60 

Ser Ala Pro Asn He Lys Lei 
75 ' 80 

Asp Ser Pro Tyr His Leu Glr 
85 90 95 

<210> 136 

<211> 90 

<212> PRT 

<213> Homo sapiens 

<400> 136 

Ly Phe Thr Cys Cys Val Pro Gly Cys Tyr Asn Asn Ser His 

15 



<400> 135 










Met 
1 


Val 


Lys Cys 


Cys 
5 


Ser Ala 


He 


Gly 


Asn 


Ser 


Lys Leu 


Lys 


Gly Leu 


Thr 


Phe 






20 








25 


Asn 


He 


Lys Arg 


Lys 


Trp Val 


Leu 


Ala 






35 






40 




Ala 


Ala 


Gly He 


Trp Glu Pro 


Lys 


Lys 




50 






55 




His 


Phe 


Lys Lys 


Thr 


Asp Phe 


Asp Arg 


65 








70 






Lys 


Pro 


Gly Val 


He 


Pro Ser 


He 


Phe 



Met 


Pro 


Gly 


Phe 


Thr 


Cys 


Cys 


Val 


Pro Gly Cys 


Tyr Asn 


Asn 


1 








5 








10 




Arg 


Asp 


^ys 


Ala 


Leu 


His 


Phe 


Tyr 


Thr Phe Pro 


Lys Asp Ala 








20 










25 




30 


Arg 


Arg 


Leu 


Trp 


Leu 


Lys 


Asn 


Val 


Ser Arg Ala 


Gly Val 


Ser 






35 










40 


45 




Phe 


Ser 


Thr 


Phe 


Gin 


Pro 


Thr 


Thr Gly His Arg Leu Cys 


Ser 




50 










55 






60 




Phe 


Gin 


Gly 


Gly 


Arg Lys 


Thr 


Tyr 


Thr Val Arg 


Val Pro 


Thr 


65 










70 






75 






Pro 


Leu 


Arg 


Gly 


Val 


Asn 


Glu Arg 


Lys Val 







80 
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<210> 137 

<211> 90 

<212> PRT 

<213> Homo sapiens 

Met°Pro 3 Ala Arg Cys Val Ala Ala His Cys Gly Asn Thr Thr Lys Ser 

1 5 10 15 

Glv Lys Ser Leu Phe Arg Phe Pro Lys Asp Arg Ala Val Arg Leu Leu 

20 25 30 

Trp Asp Arg Phe Val Arg Gly Cys Arg Ala Asp Trp Tyr Gly Gly Asn 

35 40 45 

Asp Arg Ser Val He Cys Ser Asp His Phe Ala Pro Ala Cys Phe Asp 

50 55 60 

Val Ser Ser Val He Gin Lys Asn Leu Arg Phe Ser Gin Arg Leu Arg 
65 70 75 BO 

Leu Val Ala Gly Ala Val Pro Thr Leu His 
85 90 



<210> 138 
<211> 85 
<212> PRT 

<213> Drosophila melanogaster 

Met°Lys 3 Tyr Cys Lys Phe Cys Cys Lys Ala Val Thr Gly Val Lys Leu 

1 5 10 15 

He His Val Pro Lys Cys Ala He Lys Arg Lys Leu Trp Glu Gin Ser 

20 25 30 

Leu Gly Cys Ser Leu Gly Glu Asn Ser Gin He Cys Asp Thr His Phe 

35 40 45 

Asn Asp Ser Gin Trp Lys Ala Ala Pro Ala Lys Gly Gin Thr Phe Lys 

50 55 60 

Arg Arg Arg Leu Asn Ala Asp Ala Val Pro Ser Lys Val He Glu Pro 
65 70 75 80 

Glu Pro Glu Lys He 
85 



<210> 139 
<211> 63 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> THAP Domain consensus 

K <222> 4-5^1 9-10, 12, 15-20, 22, 24, 32, 35, 38-39, 42-43, 46-47, 
49-51, 53-61, 63 

<223> Xaa = any of the twenty amino acids 

<400> 139 _ „ _ „_ 

Met Pro Lys Xaa Xaa Cys Xaa Ala Xaa Xaa Cys Xaa Asn Arg Xaa Xaa 

1 5 10 15 

Xaa Xaa Xaa Xaa Lys Xaa Lys Xaa Val Ser Phe His Lys Phe Pro Xaa 
20 25 30 

61/95 
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His Asp Xaa His Asp Xaa Xaa Arg Arg Xaa Xaa Trp Val Xaa Xaa Val 
35 40 45 

Xaa Xaa Xaa Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Trp Xaa 
50 55 60 

<210> 140 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DR-5-related sequence 
<400> 140 

gggcatacta ctggcaa 17 

<210> 141 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DR-5-related sequence 
<400> 141 

gggcaaactg tgggcat 17 

<210> 142 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DR-5-related sequence 
<400> 142. , 

gggcatacta ctggcaa 17 

<210> 143 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DR-5-related sequence 
<400> 143 

gggcaaacta ctggcaa 17 

<210> 144 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> DR-5-related sequence 
<400> 144 

gggccagttc gttgcaa 17 

62/95 
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<210> 145 

<211> 16 
<212> DNA 
<213> Artificial 



Sequence 



<220> 

<223> DR-5-related sequence 

<400> 145 16 
gggcatgtac tggcaa 

<210> 146 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> DR-5-related sequence 

<400> 146 16 
gggcaactgt gggcaa 

<210> 147 
<211> 18 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> DR-5-related sequence 

<400> 147 18 
gggcaacact actggcaa 

<210> 148 
<211> 17 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> DR-5-related sequence 

<400> 148 17 
gggcaaagta ctggcaa 

<210> 149 
<211> 17 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> DR-5 consensus sequence 



<221> unsure 
<222> 7-11 

<223> n - any of the four nucleotides 

<400> 149 17 
gggcaannnn ntggcaa 

<210> 150 
<211> 23 

63/95 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER~ll-related sequence 
<400> 150 

ttgccagtac taagtgtggg caa 

<210> 151 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 
<40O> 151 

ctgccagtac atagtgtggg caa 

<210> 152 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 
<400> 152 

ttgccagtac taagtgtggg caa 

<210> 153 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 
<400> 153 

ctgccagtag atactgtggg caa 

<210> 154 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 
<400> 154 

ttgccagtag ttaggtgtgg gcga 

<210> 155 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 



23 



23 



23 



24 



64/95 



< WO 0305 1 9 1 7A2 J_> 



WO 03/051917 



PCT/EP02/14027 



<400> 155 - 23 

ttgccagtag ttagtgtggg caa 

<210> 156 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 

<400> 156 23 
ttgccagtac ctactaaggg caa 

<210> 157 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-H-related sequence 

<400> 157 23 
ttgccagtag ttagtgtggg cag 

<210> 158 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-ll-related sequence 

<400> 158 23 
ctgccagtag taagtgtggg cag 

<210> 159 
<21i> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> ER-H consensus sequence 

<221> unsure 
<222> 7—17 

<223> n - any of the four nucleotides 

<400> 159 23 
ttgccannnn nnnnnnnggg caa 

<210> 160 
<211> 642 
<212> DNA 
<213> Homo sapiens 

IZI^I c= t?f f ? = f a ?|S c = -J,™ ™« JO 
tctttccaca agtttcctct tactcg ^^ a acaatattt attcaqagca ctttactcca 180 

SSSSS SSSSS SSSS SUA* .*. 2«. 

65/95 
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tttctttgta ctgagccaca tgacaagaaa gaagatcttc tggagccaca ggaacagctt 300 
cccccacctc ctttaccgcc tcctgtttcc caggttgatg ctgctattgg attactaatg 3 60 
ccgcctcttc agacccctgt taatctctca gttttctgtg accacaacta tactgtggag 420 
gatacaatgc accagcggaa aaggattcat cagctagaac agcaagttga aaaactcaga 4 80 
aagaagctca agaccgcaca gcagcgatgc agaaggcaag aacggcagct tgaaaaatta 54 0 
aaggaggttg ttcacttcca gaaagagaaa gacgacgtat cagaaagagg ttatgtgatt 600 
ctaccaaatg actactttga aatagttgaa gtaccagcat aa ' " 64 2 

<210> 161 
<211> 687 
<212> DNA 
<213> Homo sapiens 

<400> 161 

atgccgacca attgcgctgc ggcgggctgt gccactacct acaacaaqca cattaacatc 60 
agcttccaca ggtttccttt ggatcctaaa agaagaaaag aatgggttcg cctggttagg 120 
cgcaaaaatt ttgtgccagg aaaacacact tttctttgtt caaagcactt tgaagcctcc 180 
tgttttgacc taacaggaca aactcgacga cttaaaatgg atgctgttcc aaccattttt 240 
gatttttgta cccatataaa gtctatgaaa ctcaagtcaa ggaatctttt gaagaaaaac 300 
aacagttgtt ctccagctgg accatctaat ttaaaatcaa acattagtag tcagcaagta 360 
ctacttgaac acagctatgc ctttaggaat cctatggagg caaaaaagag gatcattaaa 4 20 
ctggaaaaag aaatagcaag cttaagaaga aaaatgaaaa cttgcctaca aaaggaacgc 4 80 
agagcaactc gaagatggat caaagccacg tgtttggtaa agaatttaga agcaaatagt 540 
gtattaccta aaggtacatc agaacacatg ttaccaactg ccttaagcag tcttcctttg 600 
gaagatttta agatccttga acaagatcaa caagataaaa cactgctaag tctaaatcta 660 
aaacagacca agagtacctt catttaa 687 

<210> 162 
<211> 720 
<212> DNA 
<213> Homo sapiens 

<400> 162 

atgccgaagt cgtgcgcggc ccggcagtgc tgcaaccgct acagcagccg caggaagcag 60 
ctcaccttcc accggtttcc gttcagccgc ccggagctgc tgaaggaatg ggtgctgaac 120 
atcggccggg gcaacttcaa gcccaagcag cacacggtca tctgctccga gcacttccgg 180 
ccagagtgct tcagcgcctt tggaaaccgc aagaacctaa agcacaatgc cgtgcccacg 24 0 
gtgttcgcct ttcaggaccc cacacagcag gtgagggaga acacagaccc tgccagtgag 300 
agaggaaatg ccagctcttc tcagaaagaa aaggtcctcc ctgaggcggg ggccggagag 360 
gacagtcctg ggagaaacat ggacactgca cttgaagagc ttcagttgcc cccaaatgcc 4 20 
gaaggccacg taaaacaggt ctcgccacgg aggccgcaag caacagaggc tgttggccgg 4 80 
ccgactggcc ctgcaggcct gagaaggacc cccaacaagc agccatctga tcacagctat 540 
gcccttttgg acttagattc cctgaagaaa aaactcttcc tcactctgaa ggaaaatgaa 600 
aagctccgga agcgcttgca ggcccagagg ctggtgatgc gaaggatgtc cagccgcctc 660 
cgtgcttgca aagggcacca gggactccag gccagacttg ggccagagca gcagagctga 720 



<210> 163 

<211> 1734 

<212> DNA 

<213> Homo sapiens 

<400> 163 

atggtgatct gctgtgcggc cgtgaactgc tccaaccggc agggaaaggg cgagaagcgc 60 
gccgtctcct tccacaggtt ccccctaaag gactcaaaac gtctaatcca atggttaaaa 120 
gctgttcaga gggataactg gactcccact aagtattcat ttctctgtag tgagcatttc 180 
accaaagaca gcttctccaa gaggctggag gaccagcatc gcctgctgaa gcccacggcc 24 0 
gtgccatcca tcttccacct gaccgagaag aagagggggg ctggaggcca tggccgcacc 300 
cggagaaaag atgccagcaa ggccacaggg ggtgtgaggg gacactcgag tgccgccacc 360 
ggcagaggag ctgcaggttg gtcaccgtcc tcgagtggaa acccgatggc caagccagag 420 
tcccgcaggt tgaagcaagc tgctctgcaa ggtgaagcca cacccagggc ggcccaggag 480 
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accqccaqcc aggagcaggc ccagcaagct ctggaacgga ctccaggaga tggactggcc 540 
accatgSgg caggcagtca gggaaaagca gaagcgtctg ccacagatgc tggcgatgag 600 
agcgcclctfcctSatSga agggggcgtg acagataaga gtggcatttc tatggatgac 660 
tttacgcccc caggatctgg ggcgtgcaaa tttatcggct cacttcattc gtacagtttc 720 
tccSaagc acacccgaga aaggccatct gtcccccgag agcccattga ccgcaagagg 780 
ctqaagaaag atgtggaacc aagctgcagt gggagcagcc tgggacccga caagggcctg 840 
qcccagagcc ctcccagctc atcacttacc gcgacaccgc agaagccttc ccagagcccc 900 
?ctgScStc ctgccgacgt caccccaaag ccagccacgg aagccgtgca gagcgagcac 960 
agcgacgcca gccccatg?c catcaacgag gtcatcctgt cggcgtcagg W«*gcaag 1020 
c?cltcgact cactgcactc ctactgcttc tcctcccggc agaacaagag ccaggtgtgc 1080 
?qcctgcggg agcaggtgga gaagaagaac ggcgagctga agagcctgcg gcagagggtc 114 0 
aqccgct?cg acagccaggt gcggaagcta caggagaagc tggatgagct gaggagagtg 1200 
agcg?cccc? atccaag?ag cctgctgtcg cccagccgcg agccccccaa gatgaaccca 1260 
g?ggtggagc cactgtcctg gatgctgggc acctggctgt cggacccacc tggagccggg 1320 
acctacccca cactgcagcc cttccagtac ctggaggagg ttcacatctc ccacgtgggc 1380 
caqcccaSc ?gaa2ttltc gttcaactcc ttccacccgg acacgcgcaa gccgatgcac 1440 
aglgagtg?g gcttcattcg cctcaagccc gacaccaaca aggtggcctt tgtcagcgcc 1500 
Sgaacacag gcgtggtgga agtggaggag ggcgaggtga acgggcagga gctgtgcatc 1560 
acatcccact ccatcgccag gatctccttc gccaaggagc cccacgtaga gcagatcacc 1620 
cggaagttca ggotglattc tgaaggcaaa cttgagcaga cggtctccat ggaaccacg 1680 
acacagccaa tgactcagca tcttcacgtc acctacaaga aggtgacccc gtaa 1734 

<210> 164 

<211> 1188 

<212> DNA 

<213> Homo sapiens 

atqccccgct attgcgcagc gatttgttgt aagaaccgcc ggggacgaaa caataaagac 60 
cqqaagc?ga gtt?ttatcc atttcctcta catgacaaag aaagactgga aaagtggtta 120 
aaqaataSa agcgagattc atgggttccc agtaaatacc agtttctatg tagtgaccat 180 
tSctcSg actctcttga catoagatgg ggtattcgat atttaaaaca -ctgcagtt 240 
ccaacaatat tttctttgcc tgaagacaat cagggaaaag acccttctaa aaaaaaatcc juu 
caglagaaaa acttggaaga tgagaaagaa gtatgcccaa aagccaagtc agaagaatca 60 
tttgtattaa atgagacaaa gaaaaatata gttaacacag atgtgcccca tcaacatcca 4^u 
gaa?tacttc attcatcttc cttggtaaag ccaccagctc ccaaaacagg aagtatacaa 480 
aataacatgt taactcttaa tctagttaaa caacatactg ggaaaccaga atctaccttg 540 
aaacatcfg ttaaccaaga tacaggtaga ggtggttttc acacatgttt tgagaatcta 600 
aattctacaa ctattacttt gacaacttca aattcagaaa gtattcatca atctttggaa bbu 
actcaaaaag Scttgaagt aactaccagt catcttgcta atccaaactt tacaagtaat 720 
tccSggaaa taaag?cagc acaggaaaat ccattcttat tcagcacaat taatcaaaca 780 
qttgaagaat SaalacaL taaagaatct gttattgcca tttttgtacc tgctgaaaat 840 
?ctaaaccct cagttaattc ttttatatct gcacaaaaag aaaccacgga aatggaagac 900 
acaqacattq aagactcctt gtataaggat gtagactatg ggacagaagt tttacaaatc 9bU 
qaacattct? aSgcagaca agatataaat aaggaacatc tttggcagaa agtctctaag 1020 
Xaclttcaa aga?aactct tctagagtta aaagagcaac aaactctagg tagattgaag 1080 
Sttggalg ctcttataag gcagctaaag oaggaaaact ggctatctga agaaaacgtc 1140 
aagattatag aaaaccattt tacaacatat gaagtcacta tgatatag 



<210> 165 
<211> 669 
<212> DNA 
<213> Homo sapiens 

<400> 165 



atggtgaaat gctgctccgc cattggatgt gcttctcgct gcttgccaaa ttcgaagtta 60 
aaaggactga catttcacgt attccccaca gatgaaaaca tcaaaaggaa atgggtatta 120 
gcaatgaaaa gacttgatgt gaatgcagcc ggcatttggg agcctaaaaa aggagatgtg 180 
?tgtgttcga ggcactttaa gaagacagat tttgacagaa gtgctccaaa tattaaactg 240 
aaacctggag tcataccttc tatctttgat tctccatatc acctacaggg gaaaagagaa 300 
aaacttcatt gtagaaaaaa cttcaccctc aaaaccgttc cagccactaa ctacaatcac 360 
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■ a C gcta2^ "Legale Sf^T ^ ttttgaacat 420 

gagctagagg atacaS^^^^^gS?,^ 9 ^ 480 
aaatcattga ggaagacaat caSaaaS? 9 ™ 5 ? accgagaaaa acgttttcag 540 
gcaaatagac tggacaSS SStaaoS ?2? gtctgatca S ccaagaaaca 600 

atttcatga ^tgttgggac tgttgtcagg agagcataga acaggactat 660 



<210> 166 
<211> 930 
<212> DNA 
<213> Homo sapiens 

<400> 166 



SSESJ alSJSS «-="°«- 60 

gccaactgcc agcggctqaa ccccJ™ aaggacaacc cgaggcgagg cttgtggctg 120 
atctacttct gctSSlS cSS™* ca 5p c ctgt gggacccggc atccgagtac 180 
tatcacaggc Sagg^ggg ggcaS?cccc acca?^ ^f 9 *.*" aatcagtgga 240 
cggacaacca agaccaaagg SS£t?ac 9 agtctttctc caagttgcgc 300 

agacgatgca ggaagcgSt? ctccoKSn CCacctggcc cccctgaagt cagccggctc 360 
ccacctgctg SgtJacctg ct?tcc???g SSSSS ° aaCtCCatt ttctccacct 4 20 
gcctccccag ctjggaggc? altacctllr- gaagaggcct ca 5^cctgc cactttgccg 4 80 
cccttgggtg ccSJgSJa ?Sagcaoac SSl?* 000 CCttttcaga cctactgggc 540 
ccctcccctc tcgalccalg glct?tc?cc 5^2 9 f a g cc ttcacc agagcggcag 600 
cccgccggag cc?acatccl oS?XS= C ^^S^ atatgctgcg cctgccccca 660 
aagcggSgag ccgaggcSc StXtocc SS^** 99 tgggcagc 9 c cttactctgg 720 
tgcaagcggc gggagcagS qcta2aa??a tll^^ ccca ^ cca gctgcaggcc 780 
cgggagalgc Sgclcgg? aga?gcccgc cagac?^ agctgcagca ^agcgggca 840 
gccatgcagc tgagcagclg cltggcctga CagaCtctga aggagcatgt gcaggacttt 900 

<210> 167 
<211> 825 
<212> DNA 

<213> Homo sapiens 
<400> 167 

aac'cgccSg Jg£c£S£ S£?££ ^aacactg cgggccgcct gggtgcagac 60 
ctgcagcaca tgggctgtga gcactaacta o^** 9 ? 3 * 9 ^tccccggct gcaggcctgg 120 
cacttcacac o2£ct£S IcagtgSca? SSS??* aCCagcactt ^tgcagcgag 180 
gtgccctcca tcttctcccg JggJccacct 2S£SS gctacctgcg gcctgatgca 240 
cagaagccag tctcgccgcc gSccccta SSsSff, a ^gaggac ccgaagcacc 300 
gccatcccag tctctggccc agtoSccta alnl^ t cacccc tgcc ccagagccct 360 
aagactgtgg ccacca?gct ccSScccc ffiSS 99 gCCCcacatc ggggagcccc 4 20 
caacctgaag tccctcrccca ctggcccctg cgccaactcc tgagcggtca 4 80 

caacgccggg tgcgjlggct aC * 9ggctgg ^ccagtgct gggagcactg 540 

ctggaacjg? tggclctgca gcScJ™ S^ 9 * 90 ? 90 acca ^ c ? ca gctgcaggcc 600 
ctgcagcgcc tglcaaclgc ccaaScc??? I 9 ° ^^gggc acgccggggt 660 

atctgtggag ggcctgacft aaS^S ^^tgagg aatcccaaac cttcaccatc 720 
gatg2ca 9 a 9 g? cggagXS ££££ 2SSS cSf^ * ™ 



<210> 168 

<211> 3171 

<212> DNA 

<213> Homo sapiens 

<400> 168 



SSS" SSSS SS — «o 

gctgttaatc gtgtggaccc ciaSSS S^JS a 9=9=tcaaa atggatcagg 120 
c t g t g ttcoa aa!a!! t S Xt~ S^SSS ~g -S 
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aaaaaaaaaq ctgtgccttc tgtttctcta tacaagattc ctcaaggtgt acatcttaaa 300 
aXaaaaXa gacaaaaaat cStaaaacaa cctcttccag acaattctca agaagttgct 360 

S 25SS S5S= =S=S S£5= 5£2g 

Sis sbs ssssss sass: » i 

aattggagag aaacagatga gtactccgca gaaatgaaac aatttgcatg tacactctac bbo 
??gtgcagta gcaaagtcta tgattatgta agaaagattc ttaagctgcc tcattcttcc 720 
aJcc?caqaa cgtggttatc caaatgccaa cccagtccag gtttcaacag caacattttt 780 
ISSSSS aacgaagagt agagaatgga gatcagctct atcaatactg ttcattgtta 840 
ataaaaaqta tacctctcaa gcaacagctt cagtgggatc ctagcagtca cagtttgcag 90U 
gggtS?gg actttggtct Iggaaaactt gatgctgatg aaacgccact tgcttcagaa 960 
actqttttgt taatggcagt gggtattttt ggccattgga gaacacctct tggttatttt 1020 
Stgtaaaca gagca?ctgg atatttgcag gctcagctgc ttcgtctgac tattggtaaa 1080 
ctgagtgaca ?aggaatcac agttctggct gttacatctg atgccacagc jcatagtgtt 1140 
cagatggcaa aagcattggg gatacatatt gatggagacg acatgaaatg tacatttcag 1200 
ca?cc??cat cttctagtca acagattgca tacttctttg actcttgcca cttgctaaga 1260 
t?aataagaa atgcatttca gaattttcaa agcattcagt ttattaatgg tatagcacat 1320 
tqqcaqcacc tcgtggagtt agtagcactg gaggaacagg aattatcaaa tatggaaaga 1380 
aJIccaagta cact?gcLa t?tglaaaat catgtactga aagtgaatag tgccacccaa 1440 
ctStSagtg agagtgtagc cagtgcatta gaatatttgt tatccttaga cctgccacct 1500 
tttcaaaac? gLLggtac catccatttt ttacgtttaa ttaacaatct gttgacatc 560 
tttaatagta ggaactgtta tggaaaggga cttaaagggc ctctgttgcc tgaaacttac 1620 
agtaaaataa accacgtgtt aattgaagcc aagactattt "gttacatt ^ctgacact lb 
agcaataatc aaataattaa aggtaagcaa aaactaggat tcctgggatt "tgctcaat J 
actaaaaqct taaaatggct ctaccaaaat tatgttttcc caaaggtcat gccttttcct IbOO 
?SS?c?ga cttacaaatt cagtcatgat catctggaat tatttctaaa ^gcttagg I860 
caggtattag taacaagttc tagccctacc tgcatggcat tccagaaagc ttactataat 1920 
tiggagacca gatacaaatt tcaagatgaa gtttttctaa gcaaagtaag catctttgac 1980 
atttcaattg ctcgaaggaa agacttggcg ctttggacag ttcaacgtca gtatggtgtc 2040 
agcgScaa agactg?ctt tcacgaagag ggtatttgtc aagactggtc tattgttca 2100 
ctaaqtqaqq cattactaga cctgtcagat cataggcgaa atctcatctg ttatgctggt Zlbu 
SStwS acaagtta?c agctctttta acttgtgagg actgcatcac tgcactgtat 2220 
qSa?cgqa?c tcaalgcctc taaaattggg tcactattat ttgttaaaaa gaagaatggt 2280 
??gca???tc cttcagaaag tctgtgtcgg gtcataaata tttgtgagcg ^tgtaaga 2340 
acccattcaa qaatggcaat ttttgaacta gtttctaaac aaagggaatt gtatcttcaa 2400 
caaSaaata? ?a?g?gagct ttctgggcat attgatcttt ttgtagatgt gaataagcat 2460 
SctSgatg gagaagtgtg tgccatcaat cactttgtca agttgctaaa ^Jtataata 520 
atctatttct taaatatcag agctaaaaat gttgcacaga atcctttaaa acatcattca =o« 
aaaaoaactg aJatgaaaac tttatcaagg aaacactggt cacctgtaca ggattataaa 2640 
?q?Saaqt? ttgctaatac cagtagtaaa ttcaggcatt tgctaagtaa cgatggatat 2700 
cSttcaaat gagagaccta aaatatatta acattttaat taagaatact tgatcaacat 2760 
?SttSagt ?caa?ttacc atattttata aattgcgcat tctgcacagt ? ^caagttt 2820 
gcaat?ctga cttattaaaa tttcaaattc tgcatatcac aaaatctcct tatacttttg 2880 
atatggcttg cagcatttat gagttttcca aaatatagaa agcagtaggt ^gtaggagc 2940 
SScSgccS acaggtactg tctttgaatt tactactgta agactaagca ^gttactgg 3000 
Zt^^xrZi-t-i- aacttqttca atctgcttca aaaacaagaa aaacaacaac tatgagttat 30bO 
cSSS^g actccatS ?gac?agact acatttctga aagatctttg gtttacgatt 3120 
c?taagaata ttgacaatae ctataaaact ttgaagataa cttttactta a 3171 



<210> 169 
<211> 774 
<212> DNA 
<213> Homo sapiens 

<400> 169 



atgccggccc gttgtgtggc cgcccactgc ggcaacacca ccaagtctgg jaagtcgctg 60 
ttccgctttc ccaaggaccg ggccgtgcgg ctgctctggg accgcttcgt scggggttgc 120 
cgcgccgact ggtacggagg caatgaccgc tcggtcatct gctctgacca ctttgcccca 180 
glc?gttttg acgtctcttc ggttatccag aagaacctgc gcttctccca ^ctgagg 240 
ctggtggcag gcgccgtgcc caccctgcac cgggtgcccg ccccggcacc taagagggga 300 
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gaggagggag accaagcagg ccgcctggac acgcgaggag . agctccaggc agccaggcat 360 
tctgaggctg ccccaggtcc agtctcctgt acacgccccc gagctgggaa gcaggctgca 4 20 
gcttcacaga ttacgtgtga aaatgaactt gtgcaaaccc aaccccatgc tgataatcca 4 80 
tctaatactg tcacttcagt acctactcac tgtgaagaag gcccagtgca taaaagtaca 54 0 
caaatttctt tgaaaaggcc ccgtcaccgt agtgtgggta ttcaagccaa agtgaaagcg 600 
tttggaaaaa gactgtgtaa tgcaactact cagacagagg aattgtggtc tagaacttcc 660 
tctctctttg acatttactc cagtgattca gaaacagata cagactggga tatcaagagt 720 
gaacagagtg atttgtctta tatggctgta caggtgaaag aagaaacatg ttaa 774 

<210> 170 
<211> 945 
<212> DNA 
<213> Homo sapiens 

<400> 170 

atgcctggct ttacgtgctg cgtgccaggc tgctacaaca actcgcaccg ggacaaggcg 60 
ctgcacttct acacgtttcc aaaggacgct gagttgcggc gcctctggct caagaacgtg 120 
tcgcgtgccg gcgtcagtgg gtgcttctcc accttccagc ccaccacagg ccaccgtctc 180 
tgcagcgttc acttccaggg cggccgcaag acctacacgg tacgcgtccc caccatcttc 24 0 
ccgctgcgcg gcgtcaatga gcgcaaagta gcgcgcagac ccgctggggc cgcggccgcc 300 
cgccgcaggc agcagcagca acagcagcag cagcagcaac agcagcaaca gcagcagcag 360 
cagcaacagc agcagcagca gcagcagcag cagcagtcct caccctctgc ctccactgcc 4 20 
cagactgccc agctgcagcc gaacctggta tctgcttccg cggccgtgct tctcaccctt 4 80 
caggccactg tagacagcag tcaggctccg ggatccgtac agccggcgcc catcactccc 540 
actggagaag acgtgaagcc catcgatctc acagtgcaag tggagtttgc agccgcagag 600 
ggcgcagccg ctgcggccgc cgcgtcggag ttacaggctg ctaccgcagg gctggaggct 660 
gccgagtgcc ctatgggccc ccagttggtg gtggtagggg aagagggctt ccctgatact 720 
ggctccgacc attcgtactc cttgtcgtca ggcaccacgg aggaggagct cctgcgcaag 780 
ctgaatgagc agcgggacat cctggctctg atggaagtga agatgaaaga gatgaaaggc 84 0 
agcattcgcc acctgcgtct cactgaggcc aagctgcgcg aagaactgcg tgagaaggat 900 
cggctgcttg ccatggctgt catccgcaag aagcacggaa tgtga 945 

<210> 171 

<211> 2286 

<212> DNA 

<213> Homo sapiens 

<400> 171 

atgccgaact tctgcgctgc ccccaactgc acgcggaaga gcacgcagtc cgacttggcc 60 
ttcttcaggt tcccgcggga ccctgccaga tgccagaagt gggtggagaa ctgtaggaga 120 
gcagacttag aagataaaac acctgatcag ctaaataaac attatcgatt atgtgccaaa 180 
cattttgaga cctctatgat ctgtagaact agtccttata ggacagttct tcgagataat 24 0 
gcaataccaa caatatttga tcttaccagt catttgaaca acccacatag tagacacaga 300 
aaacgaataa aagaactgag tgaagatgaa atcaggacac tgaaacagaa aaaaattgat 360 
gaaacttctg agcaggaaca aaaacataaa gaaaccaaca atagcaatgc tcagaacccc 420 
agcgaagaag agggtgaagg gcaagatgag gacattttac ctctaaccct tgaagagaag 480 
gaaaacaaag aatacctaaa atctctattt gaaatcttga ttctgatggg aaagcaaaac 54 0 
atacctctgg atggacatga ggctgatgaa atcccagaag gtctctttac tccagataac 600 
tttcaggcac tgctggagtg tcggataaat tctggtgaag aggttctgag aaagcggttt 660 
gagacaacag cagttaacac gttgttttgt tcaaaaacac agcagaggca gatgctagag 720 
atctgtgaga gctgtattcg agaagaaact ctcagggaag tgagagactc acacttcttt 780 
tccattatca ctgacgatgt agtggacata gcaggggaag agcacctacc tgtgttggtg 84 0 
aggtttgttg atgaatctca taacctaaga gaggaattta taggcttcct gccttatgaa 900 
gccgatgcag aaattttggc tgtgaaattt cacactatga taactgagaa gtggggatta 960 
aatatggagt attgtcgtgg ccaggcttac attgtctcta gtggattttc ttccaaaatg 1020 
aaagttgttg cttctagact tttagagaaa tatccccaag ctatctacac actctgctct 1080 
tcctgtgcct taaatatgtg gttggcaaaa tcagtacctg ttatgggagt atctgttgca 114 0 
ttaggaacaa ttgaggaagt ttgttctttt ttccatcgat caccacaact gcttttagaa 1200 
cttgacaacg taatttctgt tctttttcag aacagtaaag aaaggggtaa agaactgaag 1260 
gaaatctgcc attctcagtg gacaggcagg catgatgctt ttgaaatttt agtggaactc 1320 
ctgcaagcac ttgttttatg tttagatggt ataaatagtg acacaaatat tagatggaat 1380 



70/95 



WO 03/051917 PCT/EP02/14027 

aactatatag ctggccgagc atttgtactc tgcagtgcag tgtcagattt tgatttcatt 1440 
StacSttq ttgttc?taa aaatgtccta tcttttacaa gagcctttgg gaaaaacctc 1500 
claaaaclia cctctgatgt cttctttgcg gccggtagct tgactgcagt actgcattca 1560 
caggggcaaa cctuLydLyu ^^^JZ+Z ^irai-naat tttaatttga ggaagccaca 1620 
ctcaacgaag tgatggaaaa tattgaagtt tatcatgaat 9 Sgagctcac 1680 

aatttggcaa ccaaacttga ^tcaaatg "actccctg gg^ g cc | aagtgtc 174Q 

cagggtaact tggaatctca gctaacctct tctcagaaca gcacctcaaa 1800 

SSSSS S££ £££££ iSi-tt ?aata=gtcg 1860 
l^SaSS Xgetgacat g?atagaagt gacttaccca atcctgacac S«gt=ag=t 920 

bbss Sgssfa sssss ssss ? 9 - -°„ 

t"g!gga.? £5 £55S£ SSafc ? 

aXKgc?ta a="aaattt tgaiataaaa cacgacctgg atttaatggt ggacacatat 2220 
Ittaaact" "acaagtaa gtcagagctt cotacagata attccgaaac tgtggaaaat 2280 
acctaa 

<210> 172 
<211> 633 
<212> DNA 
<213> Mus musculus 

atgXgcagt cctgctccgc ctacggctgc aagaaccgct acgacaagga caagcccgtc 60 
tcc?tccaca agtttcctct tactcgcccc agcctttgta agcagtggga ggcagctgtt 120 
aaaaaaaaaa acttcaagcc caccaagtac agcagcatct gctcggagca cttcaccccg 180 
aaSaSSa agagggagtg caacaacaag ctactgaagg agaacgctgt gcccacaata 240 
?tSctata Kagccaca tgagaagaag gaagacctgg aatcccaaga acagctcccc 300 
^ctcrttcac cccccgcttc ccaggttgat gctgctattg ggctgctaat gccccctctg 360 
cagacccSg aScctgtc ggtJttctgt gaccacaatt acactgtgga ^tacgatg 20 
caccagagga agaggatcct gcagctggag cagcaggtgg agaaactcag ^agaagctc 480 
aagacggccc agcagcggtg ccggcggcag gagaggcagc tcgagaagct ^ggaagtc 540 
gtccactttc agagagagaa ggacgacgcg tccgagaggg gctacgtgat cctaccaaat 600 
gactactttg aaattgttga agttccagca tga 



<210> 173 
<211> 654 
<212> DNA 
<213> Mus musculus 

<400> 173 



itaccaacca attgcgccgc ggcgggctgt gctgctacct acaacaagca cattaacatc 60 
aacttccaca ggt?tccttt ggatcctaaa agaagaaaag aatgggttcg cctggttagg 120 
cacaaaaaS Sgtgccagg aaaacacact tttctttgct caaagcactt tgaagcctcc 180 
SStSa" taaclggaca aacccgacga cttaaaatgg atgctgttcc aaccattttt 240 
altttttSa cccataLaa gtctctgaaa ctcaagtcaa ggaatcttct gaagacaaac 300 
Ilcagtt?tc ctccaactgg Iccatgtaat ttaaagctga acggcagtca J-agtactg 360 
cttaaacaca gttatgcctt taggaaccct atggaggcga aaaaaaggat aattaaacta 4^0 
aaalaqaaaa ?agcaagctt gagaaaaaaa atgaaaactt gcctgcaaag agaacgcaga 480 
aSaSSaa gg?gga?caa agccacgtgc tttgtgaaga gcttagaagc aagtaacatg 540 
cScctaagg gcSctcaga acagatttta ccaactgcct taagcaatct tcctctggaa 600 
gatttaaaaa gtcttgaaca agatcaacaa gataaaacag tacccattct ctaa 



<210> 174 
<211> 657 
<212> DNA 
<213> Mus musculus 

<400> 174 



atgccgaagt cttgcgcggc ccggcaatgc tgcaaccgct acagcagccg caggaagcag 60 
ctcaccttcc accggttccc cttcagccgc ccggagctgt tgagggagtg ggtgctcaac 
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atcggccggg ctgacttcaa gcctaagcag cacacagtca tctgctcgga acacttcaga 180 
cccgagtgct tcagcgcctt tgggaaccgc aagaacctga aacacaatgc tgtgcccacg 24 0 
gtgttcgctt ttcagaaccc cacagaggtc tgccctgagg tgggggctgg tggggacagc 300 
tcagggagga acatggacac cacactggaa gaacttcagc ctccaacccc ggaaggcccc 360 
gtgcagcagg tcttaccaga tcgagaagca atggaggcca cggaggccgc tggcctgcct 420 
gccagccctc tggggttgaa gaggcccctt ccgggacagc cgtctgatca cagttatgcc 4 80 
ctttcggact tggataccct caaaaaaaaa ctctttctca cactgaagga aaacaagagg 540 
cttcggaagc ggctgaaagc ccagaggctg ctgttgcgga ggacatgtgg ccgcctgaga 600 
gcctacagag agggacagcc gggacctcgg gccagacggc cggcacaggg aagctga ~ 657 



<210> 175 
<211> 558 
<212> DNA 
<213> Mus mus cuius 

<400> 175 

atactgcaag catttggaag cctaaaaaaa 
aagacagact ttgacagaag cactctaaac 
atctttgaat gtccatatca cttacaggag 
ttccttctca aaacccttcc catcacccac 
attgaagaat tcgaacccca gttcattttt 
aagaagctta agcataagct agaccgtgtg 
ctacggaatg ttttagcccg agaaaaacac 
gaactaaagg atgaaagtct gatcagccag 
tgggagtgct atcatgaaag cacagcagga 
cttcatctgc agttgaca 

<210> 176 

<211> 1719 

<212> DNA 

<213> Homo sapiens 



ggagatgtgc tgtgttcaag acacttcaag 60 
actaagctga aggcaggagc catcccttct 120 
aaaagagaaa aacttcactg tagaaaaaac 180 
catggccgcc agcttgttgg tgcctcctgc 24 0 
gaacatagct acagtgttat ggacagccca 300 
atcatcgagc tggagaatac caaggaaagc 360 
tttcaaaagt cactgaggaa gacaatcatg 4 20 
gaaacagcca atagtctggg tgctttctgt 4 80 
ggctgtagtt gtgaagtcat ttcttatatg 54 0 

558 



<400> 176 

ctttccgcgc ggcggaagag cgcgcgccag cttcggcaca cttgggagcc ggatcccagc 60 
cctacgcctc gtcccctaca agctcctcca agccccgccg gctgctgtgg gagcggcggc 120 
cgtcctctcc tggaggtcgt ctcctggcat cctcggggcc gcaggaagga agaggaggca 180 
gcggccggag ccctggtggg cggcctgagg tgagagcccg accggcccct ttgggaatat 240 
ggcgaccggt ggctaccgga ccagcagcgg cctcggcggc agcaccacag acttcctgga 300 
ggagtggaag gcgaaacgcg agaagatgcg cgccaagcag aaccccccgg gcccggcccc 360 
cccgggaggg ggcagcagcg acgccgctgg gaagcccccc gcgggggctc tgggcacccc 4 20 
ggcggccgcc gctgccaacg agctcaacaa caacctcccg ggcggcgcgc cggccgcacc 4 80 
tgccgtcccc ggtcccgggg gcgtgaactg cgcggtcggc tccgccatgc tgacgcgggc 540 
gcccccggcc cgcggcccgc ggcggtcgga ggacgagccc ccagccgcct ctgcctcggc 600 
tgcaccgccg ccccagcgtg acgaggagga gccggacggc gtcccagaga agggcaagag 660 
ctcgggcccc agtgccagga aaggcaaggg gcagatcgag aagaggaagc tgcgggagaa 720 
gcggcgctcc accggcgtgg tcaacatccc tgccgcagag tgcttagatg agtacgaaga 780 
tgatgaagca gggcagaaag agcggaaacg agaagatgca attacacaac agaacactat 840 
tcagaatgaa gctgtaaact tactagatcc aggcagttcc tatctgctac aggagccacc 900 
tagaacagtt tcaggcagat ataaaagcac aaccagtgtc tctgaagaag atgtctcaag 960 
tagatattct cgaacagata gaagtgggtt ccctagatat aacagggatg caaatgtttc 1020 
aggtactctg gtttcaagta gcacactgga aaagaaaatt gaagatcttg aaaaggaagt 1080 
agtaacagaa agacaagaaa acctaagact tgtgagactg atgcaagata aagaggaaat 114 0 
gattggaaaa ctcaaagaag aaattgattt attaaataga gacctagatg acatagaaga 1200 
tgaaaatgaa cagctaaagc aggaaaataa aactcttttg aaagttgtgg gtcagctgac 1260 
caggtagagg attcaagact caatgtggaa aaaatatttt aaactactga ttgaatgtta 1320 
atggtcaatg ctagcacaat attcctatgc tgcaatacat taaaataact aagcaagtat 1380 
atttatttct agcaaacaga tgtttgtttt caaaatactt ctttttcatt attggtttta 14 4 0 
aaaaagcatt atccttttat ctcacaaata agtaatatct ttcagttatt aaatgataga 1500 
taatgccttt ttggttttgt gtggtattca actaatacat ggtttaaagt cacagccgtt 1560 
tgaatatatt ttatcttggt agtacatttt ctcccttagg aatatacata gtctttgttt 1620 
acatgagttc caatactttt gggatgttac cctcacatgt ccctatactg atgtgtgcca 1680 
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ccttttatgt gttgatgact cactcataag gttttggtc 



1719 



<210> 177 
<211> 878 
<212> DNA 
<213> Homo sapiens 

<400> 177 



atcccagccc acgcacagac ccccaacttg cagctgccoa cctcaccctc agctctggcc 60 

b Asss ssss sssss s» ssssg 

S SSSS 52SSS SSSSSS = 

ibi =s= sssss ssss SFIi 

ggcaagaaag gaaagggctc caaaggctgc aagaggactg ^^ggtcaca gacccctaaa 
naarcataac ccagtgagca gcctggagcc ctggagaccc caccagcctc accagcgctt o«u 
oXgcStgaa cSaga?gc aagaaggagg ctatgctcag S^cctgga gagccac 600 

S2SS 5SSSS5 SSSS SUSS ~ J - 
SSS 3255= SSSS= =225 253K SSSE X 

cctaactgaa taaaaagctg ttctgtcttc ccacccaa 

<210> 178 
<211> 34 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Interferon gamma homology motif of THAP1 

Asn^yi^hr Val Glu Asp Thr Met His Gin Arg Lys Arg He His Gin 

1 5 10 

Leu Glu Gin Gin Val Glu Lys Leu Arg Lys Lys Leu Lys Thr Ala Gin 

20 25 JU 

Gin Arg 



<210> 179 
<211> 20 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Nuclear localization sequence of THAP1 

irg^ys^g He His Gin Leu Glu Gin Gin Val Glu Lys Leu Arg Lys 
^ 5 10 

Lys Leu Lys Thr 
20 



<210> 180 
<211> 38 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Consensus sequence for PAR 4 binding domain of THAP 
<221> UNSURE 

<222> 3-16, 19, 23, 24, 25-35 

<223> Xaa = any of the twenty amino acids 

<221> VARIANT 
<222> 37 

<223> Xaa = Arg or Lys 
<400> 180 

Leu Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

15 10 15 

Gin Arg Xaa Arg Arg Gin Xaa Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Gin Arg Glu 
35 



<210> 181 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 181 

gaattcggcc attatggcct gcaggatccg gccgcctcgg cccaggatcc 50 

<210> 182 
<211> 111 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Primer 



<400> 182 



Ser 


Asp 


Gly 


Gly 


Ala 


Gin 


Asp Cys 


Cys Leu 


Lys Tyr 


Ser 


Gin Arg Lys 


1 








5 








10 








15 


He 


Pro 


Ala 


Lys 


Val 


Val 


Arg 


Ser 


Tyr Arg 


Lys Gin 


Glu 


Pro 


Ser Leu 


Gly 






20 










25 






30 




Cys 


Ser 


He 


Pro 


Ala 


He 


Leu 


Phe Leu 


Pro Arg Lys 


Arg 


Ser Gin 






35 










40 






45 




Ala 


Glu 
50 


Leu 


Cys 


Ala 


Asp 


Pro 
55 


Lys 


Glu Leu 


Trp Val 
60 


Gin 


Gin 


Leu Met 


Gin 
65 


His 


Leu 


Asp 


Lys 


Thr 
70 


Pro 


Ser 


Pro Gin 


Lys Pro 
75 


Ala 


Gin 


Gly Cys 
80 


Arg 


Lys 


Asp 


Arg 


Gly 


Ala 


Ser 


Lys 


Thr Gly Lys Lys 


Gly 


Lys 


Gly Ser 










85 








90 




95 


Lys 


Gly 


Cys 


Lys 


Arg 


Thr 


Glu 


Arg 


Ser Gin 


Thr Pro 


Lys 


Gly 


Pro 








100 










105 




110 





<210> 183 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Primer 



<400> 183 ^_ , , ^ 

gcgggatccg tagtgatgga ggggctcagg actgttg 

<210> 184 
<211> 35 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 
<400> 184 

gcgggatccc tatggccctt taggggtctg tgacc 

<210> 185 
<211> 33 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 
<400> 185 

ccgaattcag gatggtgcag tcctgctccg cct 

<210> 186 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 186 4 . 

cgcggatcct gctggtactt caactatttc aaagtagtc 

<210> 187 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 



<400> 187 

ccgaattcag gatggtgcag tcctgctccg 

<210> 188 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 188 

cgcggatcct gctggtactt caactatttc aaagtagtc 
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<210> 189 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 189 

gcggaattca tggcgaccgg tggctaccgg acc 

<210> 190 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 190 

gcgggatccc tctacctggt cagctgaccc acaac 

<210> 191 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 191 

ccgaattcag gatggtgcag tcctgctccg cct 

<210> 192 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<4O0> 192 

cgcggatcct gctggtactt caactatttc aaagtagtc 

<210> 193 
<211> 46 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 193 

cgcgaattcg ccatcatggg gttccctaga tataacaggg atgcaa 

<210> 194 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> Primer 



gccggatccg ggttccctag atataacagg gatgcaa 

<210> 195 
<211> 37 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 195 , ^ 

gcgctctaga gccatcatgg aggagcagaa gctgatc 

<210> 196 
<211> 37 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



ct^gcggccg cctctacctg gtcagctgac ccacaac 

<210> 197 
<211> 37 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 

gcggaattca aagaagatct tctggagcca caggaac 

<210> 198 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



cgcggatcct gctggtactt caactatttc aaagtagtc 

<210> 199 
<211> 35 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 199 , . . „ 

gcggaattca tgccgcctct tcagacccct gttaa 

<210> 200 
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• <211> 36 
<212> DNA 
<213> Artificial Sequence 

<220> 

<223> Primer 
<400> 200 

gcggaattca tgcaccagcg gaaaaggatt catcag 3 6 

<210> 201 
<211> 33 
<212> DNA 

<2"> 3> Artificial Sequence 
<220> 

<223> Primer 
<400> 201 

ccgaattcag gatggtgcag tcctgctccg cct 33 

<210> 202 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 202 

gcgggatccc ttgtcatgtg gctcagtaca aagaaatat 39 

<210> 203 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 203 

cgggatcctg tgcggtcttg agcttctttc tgag 34 

<210> 204 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 204 

gcgggatccg tcgtctttct ctttctggaa gtgaac 36 

<210> 205 
<211> 36 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Consensus sequence for PAR 4 binding domain of THAP 
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<221> UNSURE 
<222> 3-14, 17, 21, 23-33, 35 
<223> Xaa = any of the twenty amino acids 



<400> 205 

Leu Glu Xaa Xaa Xaa Xaa Xaa Xaa 

1 5 
Xaa Arg Arg Gin Xaa Arg Xaa Xaa 
20 

Xaa Gin Xaa Glu 
35 



Xaa Xaa Xaa Xaa Xaa Xaa Gin Arg 

10 15 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
25 30 



<210> 206 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 

<400> 206 39 
ccgcacagca gcgatgcgct gctcaagaac ggcagcttg 

<210> 207 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 
<400> 207 

caagctgccg ttcttgagca gcgcatcgct gctgtgcgg 

<210> 208 
<211> 32 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 208 32 
gctcaagacc gcacagcaag aacggcagct tg 

<210> 209 
<211> 32 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 

<400> 209 32 
caagctgccg ttcttgctgt gcggtcttga gc 

<210> 210 
<211> 36 
<212> DNA 
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• <213> Artificial Sequence 
<220> 

<223> Primer 
<400> 210 

gcgggatccc taaattagaa aggggtgggg gtagcc 

<210> 211 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 211 

gcggaattca tggagcctgc acccgcccga tc 

<210> 212 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 212 

gcggaattca aagaagatct tctggagcca caggaac 

<210> 213 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 213 

cgcggatcct gctggtactt caactatttc aaagtagtc 

<210> 214 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 214 

cgcggatccg tgcagtcctg ctccgcctac ggc 

<210> 215 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<40O> 215 
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32 



37 
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ccgaattctt atgctggtac ttc^actatt tcaaagtag 

<210> 216 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 



<400> 216 33 
gccgaattca tgccgaactt ctgcgctgcc ccc 

<210> 217 
<211> 40 
<212> DNA 

<213> Artificial Sequence f 



<220> 

<223> Primer 



<400> 217 J . 40 

cgcggatcct taggttattt tccacagttt cggaattatc 

<210> 218 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 218 ^ . , . ^ 39 

gcgctgcagc aagctaaatt taaatgaagg tactcttgg 

<210> 219 
<211> 35 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 219 . 35 

gcgagatctg ggaaatgccg accaattgcg ctgcg 

<210> 220 
<211> 35 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Primer 



<400> 220 . 35 

agaggatcct tagctctgct gctctggccc aagtc 

<210> 221 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
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<220> 
<223> Primer 

<400> 221 

agagaattca tgccgaagtc gtgcgcggcc eg 

<210> 222 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 222 

geggaattea tgccgcgtca ctgctccgcc gc 

<210> 223 
<211> 34 
<212> DNA 

<213> Artificial Sequence 



32 



32 



<220> 

<223> Primer 
<400> 223 

gegggatect caggccatgc tgctgctcag ctgc 

<210> 224 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 224 

gegagatetc gatggtgaaa tgctgctccg ccattgga 

<210> 225 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 225 

gegggatect catgaaatat agtcctgttc tatgetetc 

<210> 226 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 226 

gegagatetc gatgeccaag tactgeaggg cgccg 
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<210> 227 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

gcjgaattct tatgcactgg ggatccgagt gtccagg 

<210> 228 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 228 

gcggaattca tgccggcccg ttgtgtggcc gc 

<210> 229 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

gcgggatcct taacatgttt cttctttcac ctgtacagc 

<210> 230 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 230 

gcgagatctc gatgcctggc tttacgtgct gcgtgc 

<210> 231 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 

gcggaattct cacattccgt gcttcttgcg gatgac 

<210> 232 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Primer 
<400> 232 

ccgaattcag gatggtgcag tcctgctccg cct 33 

<210> 233 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 233 

cgcggatcct gctggtactt caactatttc aaagtagtc 39 

<210> 234 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 234 

gcgctctaga gccatcatgg aggagcagaa gctgatc 37 

<210> 235 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 235 

gcgctctaga ttatgctggt acttcaacta tttcaaagta g 41 

<210> 236 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 236 

cgcggatccg tgcagtcctg ctccgcctac ggc 33 

<210> 237 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 237 

cgcggatcct gctggtactt caactatttc aaagtagtc 39 
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<210> 238 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 238 

gccggatccg ggttccctag atataacagg gatgcaa 

<210> 239 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 239 

gcgggatccc tctacctggt cagctgaccc acaac 

<210> 240 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 240 

gcgggatcca gtgatggagg ggctcaggac tgttg 

<210> 241 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 241 

gcgggatccc tatggccctt taggggtctg tgacc 

<210> 242 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 242 

gcgcatatgg tgcagtcctg ctccgcctac ggc 

<210> 243 
<211> 36- 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> Primer 
<400> 243 

gcgctcgagt ttcttgtcat gtggctcagt acaaag 3( 

<210> 244 
<211> 62 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 

<221> unsure 
<222> 21-45 

<223> n = any of the four nucleotides 
<400> 244 

tgggcactat ttatatcaac nnnnnnnnnn nnnnnnnnnn nnnnnaatgt cgttggtggc 60 

62 

<210> 245 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 245 

accgcaagct tgggcactat ttatatcaac 3Q 

<210> 246 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Primer 
<400> 246 

ggtctagagg gccaccaacg catt 24 

<210> 247 

<211> 2173 

<212> DNA 

<213> Homo sapiens 

<400> 247 



gacgggcgat ggctgtggtc 
ccccgaggga ggccaccatc 
tccaaccgag cgcagcgaca 
aagtgcctga ggaccggaag 
tacgacaagg acaagcccgt 
aaagaatggg aggcagctgt 
tgttcagagc actttactcc 
gagaatgctg tgcccacaat 
ctggagccac aggaacagct 
gctgctattg gattactaat 
gaccacaact atactgtgga 
cagcaagttg aaaaactcag 



cttctgctaa tgcaaacaac aaaacgggca cactagtcac 60 
actgtaactg ttggccaaag ctacaaaaga agcgagggaa 120 
ctgagaacag cttcccctgc cttctgcggc ggcagaagtg 180 
gatggtgcag tcctgctccg cctacggctg caagaaccgc 240 
ttctttccac aagtttcctc ttactcgacc cagtctttgt 300 
cagaagaaaa aactttaaac ccaccaagta tagcagtatt 360 
agactgcttt aagagagagt gcaacaacaa gttactgaaa 4 20 
atttctttgt actgagccac atgacaagaa agaagatctt 4 80 
tcccccacct cctttaccgc ctcctgtttc ccaggttgat 540 
gccgcctctt cagacccctg ttaatctctc agttttctgt 600 
ggatacaatg caccagcgga aaaggattca tcagctagaa 660 
aaagaagctc aagaccgcac agcagcgatg cagaaggcaa 720 
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gaacggcagc 
tcagaaagag 
taaaaaaatg 
tgtaaaggag 
tattttttta 
tcaggttttt 
atgtaagttg 
atgtaaaaat 
aaacttaaaa 
gggtcagatc 
atgtaaagtc 
tcagtactta 
gatcaataac 
cccctactgt 
ataaataaag 
acgctttttt 
tgtatctgcc 
atgataatct 
cttttaacac 
caacctacta 
tcaaactgtg 
tctgagtaaa 
tgataattat 
ttaagacagt 
aaaaaaaaaa 



ttgaaaaatt 
gttatgtgat 
aaatgtgtat 
tttcatttaa 
aaaaaaattc 
tacattttaa 
gtttcaagat 
aaaaagtaaa 
ttagtactgt 
atgggacata 
acaaggttca 
agaaaaaaga 
ttcagattct 
cttgcattat 
taaggcattg 
ctttcattac 
ttcaagtctc 
tgcacttgac 
aagttagaaa 
tgtagcgttt 
agttctgttc 
caaatattgc 
tacttttata 
attaaaggtg 
aaa 



aaaggaggtt 
tctaccaaat 
tgatttctaa 
aaaaataaca 
tatatatact 
caaaatattt 
tggggatttt 
gagaatgaga 
tttattgaga 
acttcttaga 
tttatctttc 
tttgattatc 
aagagtggat 
aaaattagaa 
tcatcaatga 
accctagctg 
tgacaaatgt 
tgagttggga 
ttatatccca 
attttactga 
tttgtgagaa 
tatgggagtt 
tttcaaagta 
tgaaacaaaa 



gttcacttcc 
gactactttg 
tggggcaata 
tttgattact 
gtaaaattat 
taaaagttat 
ggggtttttt 
acagtgtggt 
gaatttagtt 
atatatatat 
tgaatcagtt 
atcacagcag 
tttttttttt 
gtgtattttc 
agtaattaaa 
aaggacatcc 
gctgtgttag 
caaggcttca 
tttagttaaa 
atgtggagat 
attttacata 
atctttttag 
cactaagatc 
aaaaaaaaaa 



agaaagagaa 
aaatagttga 
ccacatatcc 
tatataaaaa 
aaattttttt 
aaactaacct 
ttttagtatt 
aaaagggtga 
atattttaaa 
acatatgtac 
atcaaagata 
aaaaaagtca 
tacatgggct 
agtggaagaa 
actgggacct 
cagttcccca 
tagagtttga 
cataaaaaat 
tgcgtgattt 
ttaaacactg 
tattggaagt 
atttagaata 
gttgaagagc 
taaaaaaaaa 



agacgacgta 780 
agtaccagca 840 
tcctctagcc 900 
cagttcagaa 960 
gtttgtaatt 1020 
cagacctcta 1080 
tatagaaata 1140 
tttcagttta 1200 
tcagaagtat 1260 
atattctcat 1320 
aattggcaag 1380 
ttgcatatct 1440 
cctatttttt 1500 
acatttttca 1560 
gatctatgat 1620 
gctgtagtta 1680 
tttgtatcat 1740 
tatttcttca 1800 
atattcagaa 1860 
aggtttctgt 1920 
gaaaatatgt 1980 
actgttccaa 204 0 
aatagaacct 2100 
aaaaaaaaaa 2160 
2173 



<210> 248 

<211> 1302 

<212> DNA 

<213> Homo sapiens 



<400> 248 

aattgctctg 

gagaaagaac 

aggagggcca 

ccgtccttcg 

agctctcacg 

gcagccagcg 

ccaattgcgc 

acaggtttcc 

attttgtgcc 

acctaacagg 

gtacccatat 

gttctccagc 

aacacagcta 

aagaaatagc 

ctcgaagatg 

ctaaaggtac 

ttaagatcct 

ccaagagtac 

ttcagaagta 

ttgaagtaac 

ggaaatttta 

ttgctaataa 



aggaccgctg 
gggttgtgtc 
gttctgtggg 
cctccgcccc 
attaaggcac 
cctcagtaga 
tgcggcgggc 
tttggatcct 
aggaaaacac 
acaaactcga 
aaagtctatg 
tggaccatct 
tgcctttagg 
aagcttaaga 
gatcaaagcc 
atcagaacac 
tgaacaagat 
cttcatttaa 
aagataatta 
attactgaat 
tttgaaaatg 
attgtgtatt 



ccaaagaaac 
cgccatgttg 
ctctagtcgg 
cacatacaca 
gcctgcctcg 
gacctaaggg 
tgtgccacta 
aaaagaagaa 
acttttcttt 
cgacttaaaa 
aaactcaagt 
aatttaaaat 
aatcctatgg 
agaaaaatga 
acgtgtttgg 
atgttaccaa 
caacaagata 
atttagcttg 
tggcacttat 
ttgtgaagac 
agtggaagtg 
tgaaaaaaaa 



gcagtagatc 
gtgaagtcaa 
ccatattaat 
ccccttcttc 
attgtccagc 
cgctgaatga 
cctacaacaa 
aagaatgggt 
gttcaaagca 
tggatgctgt 
caaggaatct 
caaacattag 
aggcaaaaaa 
aaacttgcct 
taaagaattt 
ctgccttaag 
aaacactgct 
cacagagctt 
gccaaaattc 
ttgattacaa 
ccttacatta 
aaaaaaaaaa 



cgctccctct 
gcgaaggcga 
aaagagaaag 
ccactccgct 
ctctgccaga 
gtgggaaagg 
gcacattaac 
tcgcctggtt 
ctttgaagcc 
tccaaccatt 
tttgaagaaa 
tagtcagcaa 
gaggatcatt 
acaaaaggaa 
agaagcaaat 
cagtcttcct 
aagtctaaat 
gatgcctatc 
attatttaat 
aagaataaaa 
gaattacgga 
aa 



tgggggcggg 
ctagagctcc 
ggaaggctga 
ctcacgacta 
agaaagctta 
gaaatgccga 
atcagcttcc 
aggcgcaaaa 
tcctgttttg 
tttgattttt 
aacaacagtt 
gtactacttg 
aaactggaaa 
cgcagagcaa 
agtgtattac 
ttggaagatt 
ctaaaacaga 
cttcattctt 
aaagttttac 
aacttcatat 
cttaaaaatt 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1302 



<210> 249 

<211> 1995 

<212> DNA 

<213> Homo sapiens 

<400> 249 
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gSggccg?t Sttt£2£ SS^**" Cacaattcaa agccctaaaa acactgaggg 60 
tSSSSS gSSSSf gcSaScaSa C t^ gagt ? gC tatgcgtCCt WttgSggS 120 
caaggctagg ag??g1ggK ?cqgacc?aa i^ 09 *** 9 ^gtaagact tgcacaggcc 180 
occr™ on ?r tcgggcctga attggggccc ggagcacccc tttacgtqqc 24 0 

cS S SXSSS cg?c g a g SKc C g S 9 a" Ct gggatgg ~ g ^tgagcJSg 30S 
cctcccctct ^cgcagjccc cgccgScgcc gccatct??a * CCgCCCCtC <=«*caggtc 360 
ctcgagatgc cgaagtcgtg cgcqqcccaa 9 ttgggggca 9 ^caggcctgg 4 20 

aagcagctca ccttccacca a?S^ 9 ^ Cagt 9 ct 9 ca accgctacag cagccgcagg 4 80 
ctgaacatcg gccggggcaa a ? afa^f 00 " agctgct ^ aa ^atgggtg 540 

ttccggccag agtjc??cag cqcctttaS aagCagcaca ^ggtcatctg ctccgagcac 60O 

2SS5S fed- ~~ 2« SSSSSS 2JS 

JSSSS ~ ?T S= s ~ JS5S2c !S 
SKSS £ T~ 5SSSS SSSS iSS 

cc?gcttggc cggJgScga aacaaKacc tt^S 9 ^ tgagaaa ^t ^atgagg 1260 

SKES IsS III ~ SSSE IS 

SSS5S SS83 ?~ SSI ™ 

aatcccatgt ctactaaaaa aMna^M? agrt:caagac cagcctggcc aacatggtga 1800 
cccagcta c tggjagjctg agacaaaaS tt^T^ 9 tggtggcaca cacctgtagt 1860 
gtgagecgag atcacJSaS toc^?S?2 ^ g ^ tgaa ^ccgggaggt ggaggttgca 1920 
gt^tgaaaaa aaaaa tgccctcta <? tattgtcact gggtgacaga gegagactea 1980 



<210> 250 

<211> 1999 

<212> DNA 

<213> Homo sapiens 

<400> 250 



SII ss s ~- — - - 

alltgSSc cXclIag^ X=a??tctc £££££ tcagagggat 180 

K3S SS SSS £ S S5S SSS 3 

ggttggtcac cgKtogJg ?ggJ2cccS atoocSaS CCaCCggca 9 ^ggagctgea 4 20 
caagctgctc tgcaagg£ga agSacacS aoSroS^ ca <? a 3tcccg caggttgaag 4 80 
caggcccagc aagctSgga acggactcca aggaggccgc ^gecaggag 540 

agtcagggaa aagcagalgc gtSgccaca S£22S™ tggccaccat ggtggcaggc 600 
atcgaagggg gcgtgLaga LagJgtggc SSSSSS iSSff 90 CaCttCCtCC 660 
. tctggggcgt gcaaatttat cggctcaSt cattSaS 2?? I t° p cccca ^ a 7 20 
cgagaaaggc catctgtccc clgagagccc aSgSaca lattn^ ° taagcacacc 78 0 
gaaccaagct gcagtgggag cagcctggga SccSacaS S*? 9Ctgaa ^aagatgtg 840 
agctcatcac ttaccgc^ac accgcagXg cct?cccaS SSSJT* gagccctccc 900 
gacgtcaccc caaagccagc cacgga^gcc ESS?* 90 CCCtcct ^c 960 

atgtccatca acgaggtcat cctotcooco KJS^ 9 9 a 5 c acagcga cgccagcccc 1020 
cactcctact gc?tI?cSc ccgg^gla? aag^SS fSS?^ Cgactcact * 10 *° 
gtggagaaga agaaeggega gctgaajagc cgcgSg «SS 
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caggtgcgga agctacagga gaagctggat gagctgagga gagtgagcgt cccctatcca 1260 
ag?agcc?gc tgtcgcccag ccgcgagccc cccaagatga acccagtggt ^jccactg 1320 
tcctaaatqc tgggcacctg gctgtcggac ccacctggag ccgggaccta ccccacactg ijwu 
cagSXKc agXcctgga ggaggttcac atctcccacg tgggccajcc -tgctgaac 440 
i-tctcattca actccttcca cccggacacg cgcaagccga tgcacagaga gtgtggcttc l^uu 
at?cgcctca agcccgacac caacaaggtg gcctttgtca gcgcccagaa cacaggcgtg 1560 
qtgglagtgg aggagggcga ggtgaacggg caggagctgt gcatcgcatc ccactccatc 1620 
accaaqatct ccttcgccaa ggagccccac gtagagcaga tcacccggaa gttcaggctg 1680 
aaScKaag gcaaacttga gcagacggtc tccatggcaa ccacgacaca gccaatgact 1740 
SacatSS acgtcaccla caagaaggtg accccgtaaa cctagagctt ctggagccct 1800 
cgggagggcc tggctactgt gcc?caacgg ttcggctcct caacagacag tccctgcggc I860 
alaaataqqt gtqgccgtga gcctctgcag gctcaagagt gttgtccaga tgtttctgta 1920 
Xggca?aga aaaacclaat aaaaggcctt tatttttatg gctgaggatt ttgaatatta 1980 
aaaaaaaaaa aaaaaaaaa 



<210> 251 

<211> 1398 

<212> DNA 

<213> Homo sapiens 

<400> 251 



Jgctgtgcgc cacttccggc ttcaaccccc gaaaaggcgg tgcttaaacc g*aggaggcg 60 
gaag?gag?c gacagacgag gcggctttcc cggcagaatg ctagcgcagg cgcaggggct 120 
SaaSScct ggacctgtgg cgcatcctca gtgaggaggg ccgccctgca tccgtcgccg 180 
gS?cggtct SagggScS cLccgagtc atgccccgct attgcgcagc ^tttgttgt 240 
Laaaccacc qqqqacgaaa caataaagac cggaagctga gtttttatcc atttcctcca juu 
ca?qacaaag aaagac?gga aaagtggtta aagaatatga agcgagattc atgggttccc 360 
aataaKcc agSSctatg tag?gaccat tttactcctg actctcttga catcagatgg 4 20 
ggtaScgat Sttaaaaca aactgcagtt ccaacaatat tttctttgcc tgaagacaat 480 
cagggaaaag acccttctaa aaaaaaatcc cagaagaaaa acttggaaga tgagaaagaa 540 
otltacccaa aaqccaagtc agaagaatca tttgtattaa atgagacaaa gaaaaatata bUU 
qtXacacaq atgtgcc^ca tcaacatcca gaattacttc attcatcttc cttggtaaag 660 
ccaccaqctc ccaaaacagg aagtatacaa aataacatgt taactcttaa tctagttaaa 720 
caaSaSg ggaaaccaga atctaccttg gaaacatcag ttaaccaaga tacaggtaga 7 0 
ciataattttc acacatgttt tgagaatcta aattctacaa ctattacttt gacaacttca e«u 
lattcagaaa gta^tcatca a?c?ttggaa actcaagaag ttcttgaagt aactaccagt 900 
cllltlacll atccaaactt tacaagtaaf tccatggaaa taaagtcagc acaggaaaat 960 
SattctSJ SLgcacaat taatcaaaca gttgaagaat taaacacaaa taaagaatct 1020 
gtSSgcca tttttgtacc tgctgaaaat tctaaaccct cagttaattc "ttatatct 1080 
acacaaaaaq aaaccacgga aatggaagac acagacattg aagactcctt gtataaggat 11 4U 
qtaqacXS ggacagaagt tttacaaatc gaacattctt actgcagaca agatataaat 1200 
aaaaaacatc ?ttggcagaa agtctctaag ctacattcaa agataactct tctagagtta 1260 
aaagagcaac aaaSctagg tagattgaag tctttggaag ctcttataag ? cagctaaag 1320 
caggaaaact ggctatctga agaaaacgtc aagattatag aaaaccattt tacaacatat 1380 
gaagtcacta tgatatag 

<210> 252 

<211> 2291 

<212> DNA 

<213> Homo sapiens 

aacqaaqqca gacgcagtct ccatcgttga cgttagtcgc agtcttcgct gctaacgttt 60 
SSSSart Xctaaaatg gtgaaatgct gctccgccat tggatgtgct tctcgctgct 120 
tqccaaaSc gaagXaaaa ggactgaSat ttcacgtatt ccccacagat gaaaacatca 180 
aaaggaaatg ggtattagca Sgaaaagac ttgatgtgaa tgcagccggc jtttgggagc 240 

K2S2 32S2 3S5E S33S KgS =332 
232K =SS=S SSEK g c~ g X SKSS-i - 

agttcatttt tgaacatagc tacagtgtaa tggacagtcc aaagaaactt aagcataaat t»40- 
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tagatcatgt gatcggcgag ctagaggata caaaggaaag tctacggaat gttttagacc 600 
gagaaaaacg ttttcagaaa tcattgagga agacaatcag ggaattaaag gatgaatgtc 660 
tgatcagcca agaaacagca aatagactgg acactttctg ttgggactgt tgtcaggaga 720 
gcatagaaca ggactatatt tcatgaaata atttcatgtt acgttccacc taaaattqtc 780 
attggtacaa atttttataa aatctcattt accatcacta aataatatcc atcatttaaa 840 
gtgctgcttt ggattctctg gagcattatg cattatagtt gttatccaaa gacttttttq 900 
aaaatatgca gaaatttgtg gtaattatgt atttgtgtct tgtgacaatt atgttttata 960 
gacctacact agtgccaggt cactattgta agatgttaaa atctcaagaa aatttcacag 1020 
SSSS;? atgatgtcaa attagtcaca ttaagctata gtagaaggaa ttggacactt 1080 
ctccagatat ttggcttcaa aggagtacct ttacttacat gtgctttatg gtaagtacat 1140 
tgaattttac tttaaatgca ttttactaca aagcacaatt catttgtaat gcatatccat 1200 
cttggattca atccaaggtg ctttagctat cagtagtacc aaaggatctt tttacaaggc 1260 
ttcctgtggt attgactctg agaataacac atagtgaaga tctgtgggct tttaaaattg 1320 
ttcacagcca atttaagaag acccctcatg aagtctcagt tttcagtaca gtacatcatt 1380 
cctcctcact aggagcactt tgatgtaaac cagaatagct ttaaaaagac aaaaaggatc 14 40 
gtagatctga tttttaaatg gttggttgct ctgacagatc tgaacacttt gcttcatgac 1500 
aSSafff? S^???? ^ g " taaaat ctgaatggca gtactagctc tatactttta 1560 
atactgcttt gtattttata tgtaaagtag tattgctgac attttaaaaa aatacaaaat 1620 
acaaaagaaa ccattagaaa ttaataactg tggctcttcc agttgaaata ggaattggag 1680 
agaaaggatt agaatatttt aattagggga gtagattatt gtccaaaggc ttttatttaq 1740 

! a " aaa f Ca H° agCtttag aata gcttct tactgaatat gcaaaagaat 1800 
aattccttgt tatttcctaa ttgatccaag tctcataaat ttagcttttg tcataattcc 1860 
ttaccgaaaa caactgaaat tgagagtcat aaatactgtg ggttagaata aaaaccattt 1920 
gccaaagcaa cactctactt agaagcacat gtacatacat ggacctcatt cagaagtcca 1980 

f ta ^" eragtatcagc catttcattg tagtaacaaa aattgaattg 204 0 
cattttgtgc tcagttgttt attgtaattt tatttttgtt acattaatat tagttaaga? 2100 
atggtcactt gaattttttg tatttaagaa ttttctgttt taatgcatgt tatactttta 2160 
tgtaggattc ccaaccttcc ctctaaatgg gatttaaccc acatctgcga gatcagcgtt 2220 
atgctaagag gaaatcactg aggccatatc tttttacaat ctgaaaaaaa agtagtaaaa 2280 
aggtagttaa a ^ 229\ 

<210> 253 

<211> 1242 

<212> DNA 

<213> Homo sapiens 



<400> 253 

cgtgagtgcc 

gccggggggg 

gctcctccgg 

gcgtcactgc 

catctccttc 

ctgccagcgg 

cttctgctcc 

caggctaaag 

aaccaagacc 

atgcaggaag 

tgctgatgtc 

cccagctggg 

gggtgcccag 

ccctctcgaa 

cggagcctac 

gcgagccgag 

gcggcgggag 

gaagcgggca 

gcagctgagc 

gactgcagcc 

attctagacg 

<210> 254 
<211> 1383 



gctgacagaa 

acccgacagg 

atgcccggag 

tccgccgccg 

cacagacttc 

ctggacccca 

aaacactttg 

gagggggcag 

aaaggacaca 

cgctgctccg 

acctgctttc 

aggctggagc 

gcagatgaag 

ccacggccag 

atccagaatg 

gcagcccttg 

cagcggctgc 

caggcagatg 

agcagcatgg 

tcttcctccc 

gagaaaaaaa 



gtcaagagaa 

ccagagcccc 

agccgcttgc 

gctgctgcac 

ccaagaagga 

gcggccaggg 

aggaggactg 

tccccaccat 

gttacccacc 

agggccgagg 

ctgtggaaga 

ctggccttag 

caggctgcag 

tctccccctc 

aacacagcta 

atgcccttga 

ggttgagact 

cccgccagac 

cctgaggggc 

tcagatccca 

aaaaaaaaaa 



tcggctggga 

ttggggagga 

gacttaactc 

acgggacacg 

caacccgagg 

cctgtgggac 

ctttgagctg 

atttgagtct 

tggcccccct 

gcccacaact 

ggcctcagca 

cagccccttt 

cgcccagcct 

agcgtatatg 

ccaggtgggc 

caaggcccag 

gaccaagctg 

tctgaaggag 

tgctggactg 

ccagacccac 

aaaaaaaaaa 



cggggttggg 

gcggcggctg 

ccgcctcttt 

cgcgagacgc 

cgaggcttgt 

ccggcatccg 

gtgggaatca 

ttctccaagt 

gaagtcagcc 

ccattttctc 

cctgccactt 

tcagacctac 

tcaccagagc 

ctgcgcctgc 

agcgccttac 

cgccagctgc 

cagcaggagc 

catgtgcagg 

accgaggggc 

caggtgccat 

aa 



gcgacaacgg 

gaggcgcgag 

cccagatgcc 

gcaaccgcgg 

ggctggccaa 

agtacatcta 

gtggatatca 

tgcgccggac 

ggctcagacg 

cacctccacc 

tgccggcctc 

tgggcccctt 

ggcagccctc 

ccccacccgc 

tctggaagcg 

aggcctgcaa 

gggcacggga 

actttgccat 

tgcccagcaa 

aataaagcgg 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1242 
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agcgggggtc ggggttaggc ggcgctccgc gagaaccaaa gtgcagccgc tgacccggca 60 
llaclclaca aaqqctggat agccatgccc aagtactgca gggcgccgaa ctgctccaac 120 
aSgcgggcc gc??ggg?gc ajacaaccgc cctgtgagct tctacaagtt cccactgaag 180 
qatggtcccc ggctgcaggc ctggctgcag cacatgggct gtgagcactg ggtgcccagc 240 
?gccaccagc Scttgtgcag cgagcacttc acaccctcct gcttccagtg gcgctggggt 300 
q?gcgctacc tgcggcctga tgcagtgccc tccatcttct cccggggacc acctgccaag 360 
IqtSgcgga ggacccgaag cacccagaag ccagtctcgc cgccgcctcc cctacagaag 4 20 
aatacacccc Jgcccclgag ccctgccatc ccagtctctg gcccagtgcg cctagtggtg 480 
Sgggcccca catcggggag ccccaagact gtggccacca tgctcctgac ccccctggcc 540 
cctqcgccaa ctcctgagcg gtcacaacct gaagtccctg cccaacaggc ccagaccggg 600 
ctqggcccag tgctgggagc actgcaacgc cgggtgcgga ggctgcaacg gtgccaggag 660 
cggcaccagg cgcagc?g?a ggccctggaa cggctggcac agcagctaca engage 720 
ctgctggcac gggcacgccg gggtctgcag cgcctgacaa cagcccagac ccttggacct 780 
gaggaa?ccc aaaccttoac catcatctgt ggagggcctg acatagccat OTtccttgoc 840 
caaqaccctg cacctgccac agtggatgcc aagccggagc tcctggacac tcggatcccc 900 
Tallltttll gatcaagaca gacaatgtcg agggacaaaa gatagaagat ggaggaggaa 960 
IS?SSS catqqqcttg gcccagcccc accgcccacg cctgggtagt agcagtgcct 1020 
ccctcaagqg cctgggttS accaccccac tccLgggat ctcttgaacc ttaggggtga 1080 
cctgggccca agtc?ctcat cagcccccaa tcccctgggt accaggcttc tgccaccccc 1140 
ggc?cagatc tttgcaaatc agtacgacag cctcagagca gagcaagggt tgtttgggag 1200 
aatcatacct ggttctaagg agtcccacgc tttttgccaa gcctggtact gagttcatga 1260 
taccatggS qacacagc?g agaaaatccc tgccctcatg gtgctcattc tacttgagta 1320 
gacgatgaac ?agtaaacal ataaacaaga acactgcaga catgaaaaaa aaaaaaaaaa 1380 



aaa 

<210> 255 

<211> 3627 

<212> DNA 

<213> Homo sapiens 

<400> 255 



attcatgctg tcgcgggaac cccgaaggtg gggccccacg taacaagaag atgacccgaa 60 
gttgctccgc agtgggctgc agcacccgtg acaccgtgct cagccgggag cgcggcctct 120 
ccSSacca a?t?ccaact gataccatac agcgctcaaa atggatcagg ^gttaato 180 
qtgtggaccc cagaagcaaa aagatttgga ttccaggacc aggtgctata ctgtgttcca 240 
lacattttca agaaagtgac tttgagtcat atggcataag aagaaagctg aaaaaaggag 300 
cSgccttc ?gtttctcta tacaagattc ctcaaggtgt acatcttaaa ggtaaagcaa 360 
ga?aaaaaat cctaaaacaa cctcttccag acaattctca agaagttgct -tgaggaco 20 
Itaactatag tttaaagaca cctttgacga taggtgcaga gaaactggct J^gcaac 480 
aaatattaca agtgtccaaa aaaagactta tctccgtaaa gaactacagg atgatcaaga 54U 
aqagaaaggq StScgatta attgatgcac ttgtagaaga gaaactactt tctgaagaaa 600 
SSSStet gctacgagct caattttcag attttaagtg ggagttatat aattggagag 660 
aaacagatga gXctccgca gaaatgaaac aatttgcatg tacactctac ttgtgcagta 720 
gSaag?c?a ?gattatgta agaaagattc ttaagctgcc tcattcttcc atcctcagaa 780 
cataattatc caaatgccaa cccagtccag gtttcaacag caacattttt tcttttcttc «4U 
aacaaagag? agagaatgga gatcagctct atcaatactg ttcattgtta ataaaaagta 900 
iacctScaa gXacagSt oagtgggatc ctagcagtca cagtttgcag ^ttatgg 960 
actttggtct Jggaaaactt gatgctgatg aaacgccact tgcttcagaa J^gttttgt 1020 
t- a3 i- a araat aaatattttt ggccattgga gaacacctct tggttatttt tttgtaaaca iuhu 
ll^lloZ KatSgcag gctcagctgc ttcgtctgac tattggtaaa ctgagtgaca 1140 
taggaatcac agttctggct gttacatctg atgccacagc acatagtgtt ^agatggcaa 12UU 
aagcattggg gatacatatt ga'tggagacg acatgaaatg tacatttcag catccttcat 1260 
cttctagtca acagattgca tacttctttg actcttgcca cttgctaaga ttaataagaa 1320 
atqcaKca gaattttcaa agcattcagt ttattaatgg tatagcacat tggcagcacc 1380 
SgtggagS agXgcactg glggaacagg aattatcaaa tatggaaaga ataccaagta 1440 
caStgcLa t?tgaaaaat latgtactga aagtgaatag tgccacccaa ctctttagtg 1500 
agagtgtagc cagtgcatta gaatatttgt tatccttaga cctgccacct tttcaaaact ±s>bu 
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gtattggtac catccatttt ttacgtttaa tf-aap^^*. , 

ggaactgtta tggaaaggga cttalagggc SS^f gtttgacatc tttaatagta 1620 
accacgtgtt aa??gaag?c aagaSittt tS?S tgaaacttac agtaaaataa 1680 

aaataattaa aggtaagcaa aaacSggat tccSaat^ ttf^t ag " ataatc "40 
taaaatggct ctaccaaaat tatgtttlcc cS2S«J fS**^ gctgaga 9<=t 1800 
cttacaaatt cagtcatgat catctgqaJt ^cttttcct tatcttctga 1860 

taacaagttc tagccctacc tgcSjgcat tccagaaaac f?*?? 9 ? caggtattag 1920 
gatacaaatt tcaagatgaa g?ttttctaa ocSf^f 9 ttactataat ttggagacca 1980 
ctcgaaggaa agac?tggcg ctttggJcag t^aacgtca Tjltl^ attt ? aa ttg 2040 
agactgtctt tcacgaagag ggtatttgtc aaoar?™? gtatggtgtc agcgttacaa 2100 
cattactaga cctg?caga? catagjcg^ ate£at£S tSaffaS f 2160 
acaagttatc agctctttta acttotaLa 9 "atgctggt tatgttgcaa 2220 

tcaaagcctc taaaattggg tcacXSS ?2?££ ° tgcactgtat ^catcggatc 2280 
cttcagaaag tctgtgtcgg gSaJaaaS S?S gaagaatggt ttgcattttc 2340 

gaatggcaat ttttgScta gtKSaaa? ~ 9 9 " g f? agttgtaa 9 a acccattcaa 2400 
tatgtgagct ttctgggcat IttaSct?? SS" 3 ?^ gtatcttcaa cagaaaatat 24 60 
gagaagtjtg tgccSLS St g ' aStgctaJa TaT^T °^ ttgatg 2520 
taaatatcag agctaaaaat gttgcalaaa f 9 " 9 ^ ggatataata atctgtttct 2580 
atatgaaaac t?tatcaagg L cc ^ aca ^ttca gagagaactg 264 0 

ttgctaatac cagtagtaaa tteSoSS 9 ^ ggattataaa tgttcaagtt 2700 

gagagaccta aaltatSS aStttSS SaSXS " g \ tggatat ccattcaaat 2760 
tcaatttacc atattttata aattSSS taagaatact tgatcaacat tttttgaagt 2820 
cttattaaaa tttcaaattc S aaaEctS tT^tl^ gcaat tctga 2880 

cagcatttat gagttttcca aEKagaa aJctSS ^ tacttttg ^atggcttg 2940 
acaggtactg tctttgaatt tsp^f agcagtaggt cagtaggagc aaactagcca 3000 
aac??gttcl aSgcXca 2a2a2aa JSSS* 903 gtgttactgg acacag?ttt 3060 
actccattta tgactagact aSatSSga aaSStSa Sf?^? Caaaatattg ^ 
ttgacaatac ctataaaact ttgaagatK 5?£2£2 EE? 09 *** f* taagaata 31 ™ 
aaaattaggc tcaagcaaat atcaaKc? tta tagtttg 3240 

aaatagaagt atattttgat gtttgtSSt ?SaS?~f c <**?tccca ggatacccta 3300 
gtgtaaccag tttatacttc attSaSca aScSSS acagaggctt aagttttgaa 3360 
gctttttgtt gtttgtaaat agaattSS 9 ctgagaa 9 tc tgaatattgt 3420 

ggaagcc?gt ?tgaLatca cSact?taa cSattoct? SSS?" Ctt \ tttaaa 3480 
cctgatcttt atgtaaagca agattcatat * atataaatcc aagctctgta 3540 

tttgactaaa atlcaaatg? S2S gtgtagtatc taatgccctt tggtgttaca 3600 



<210> 256 
<211> 771 
<212> DNA 
<213> Homo sapiens 

<400> 256 



'2SKK £2325 g c ?ccg c t a Scg g S SS a c c T 9t f 9 ? gaagt — g *° 

cgcgccgact ggtacggagg caatgaccgS S2SS?2? ac ^ ttcgt ^cggggttgc 120 
gcctgttttg aSgtcStte ggttXSg JaSacctgc SS?^ Ctttg r CCCa 180 
ctggtggcag gcgccgtgcc caccctgcac eSS£« gcttctccca gcgcctgagg 240 

gaggagggaj LLagcfgg Scctjgac ScgcSggS aSSS"* taagagggga 300 
tctgaggctg ccccaggtcc agtctcctgt aScaSSS agCt f Caggc ^caggcat 360 
gcttcacaga ttacgtgtga alatgaacS ^tgggaa gcaggctgca 420 

tctaatacig tcacttSrt acctacKc g * gcaaaccc aaccccatgc tgataatcca 480 
caaatttctt tglaaaJgS ccgtcaccgt latitat 9 g ^ ccagtgca taaaagtaca 540 
tttggaaaaa gactgtgtSa taSSfcS ag * gtgggta ttcaagccaa agtgaaagcg 600 
tctSctttg Ica2t£tc S??gaSca gaKcaSS ca^"^ * agaacttcc 660 
gaacagagtg atttgtctta ta?g?ctgta SggtgKaJ 2£22S ]™ 

<210> 257 
<211> 942 
<212> DNA 
<213> Homo sapiens 
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<400> 257 
atgcctggct 
ctgcacttct 
tcgcgtgccg 
tgcagcgttc 
ccgctgcgcg 
cgccgcaggc 
cagcaacagc 
cagactgccc 
caggccactg 
actggagaag 
ggcgcagccg 
gccgagtgcc 
ggctccgacc 
ctgaatgagc 
agcattcgcc 
cggctgcttg 



ttacgtgctg 
acacgtttcc 
gcgtcagtgg 
acttccaggg 
gcgtcaatga 
agcagcagca 
agcagcagca 
agctgcagcc 
tagacagcag 
acgtgaagcc 
ctgcggccgc 
ctatgggccc 
attcgtactc 
agcgggacat 
acctgcgtct 
ccatggctgt 



cgtgccaggc 
aaaggacgct 
gtgcttctcc 
cggccgcaag 
gcgcaaagta 
acagcagcag 
gcagcagcag 
gaacctggta 
tcaggctccg 
catcgatctc 
cgcgtcggag 
ccagttggtg 
cttgtcgtca 
cctggctctg 
cactgaggcc 
catccgcaag 



tgctacaaca 
gagttgcggc 
accttccagc 
acctacacgg 
gcgcgcagac 
cagcagcaac 
cagcagtcct 
tctgcttccg 
ggatccgtac 
acagtgcaag 
ttacaggctg 
gtggtagggg 
ggcaccacgg 
atggaagtga 
aagctgcgcg 
aagcacggaa 



actcgcaccg 
gcctctggct 
ccaccacagg 
tacgcgtccc 
ccgctggggc 
agcagcaaca 
caccctctgc 
cggccgtgct 
agccggcgcc 
tggagtttgc 
ctaccgcagg 
aagagggctt 
aggaggagct 
agatgaaaga 
aagaactgcg 
tg 



ggacaaggcg 60 
caagaacgtg 120 
ccaccgtctc 180 
caccatcttc 2 40 
cgcggccgcc 300 
gcagcagcag 360 
ctccactgcc 4 20 
tctcaccctt 4 80 
catcactccc 540 
agccgcagag 600 
gctggaggct 660 
ccctgatact 7 20 
cctgcgcaag 7 80 
gatgaaaggc 84 0 
tgagaaggat 900 
942 



<210> 258 

<211> 2283 

<212> DNA 

<213> Homo sapiens 



<400> 258 
atgccgaact 
ttcttcaggt 
gcagacttag 
cattttgaga 
gcaataccaa 
aaacgaataa 
gaaacttctg 
agcgaagaag 
gaaaacaaag 
atacctctgg 
tttcaggcac 
gagacaacag 
atctgtgaga 
tccattatca 
aggtttgttg 
gccgatgcag 
aatatggagt 
aaagttgttg 
tcctgtgcct 
ttaggaacaa 
cttgacaacg 
gaaatctgcc 
ctgcaagcac 
aactatatag 
gttactattg 
caggggcaaa 
ctcaacgaag 
aatttggcaa 
cagggtaact 
ccaacagtgg 
gctcttaaat 
gaggaacacc 
gagcttcatt 
accatctatg 
ctgaaggtcc 
cgaaagcgtc 
gctttgctta 



tctgcgctgc 
tcccgcggga 
aagataaaac 
cctctatgat 
caatatttga 
aagaactgag 
agcaggaaca 
agggtgaagg 
aatacctaaa 
atggacatga 
tgctggagtg 
cagttaacac 
gctgtattcg 
ctgacgatgt 
atgaatctca 
aaattttggc 
attgtcgtgg 
cttctagact 
taaatatgtg 
ttgaggaagt 
taatttctgt 
attctcagtg 
ttgttttatg 
ctggccgagc 
ttgttcttaa 
cctctgatgt 
tgatggaaaa 
ccaaacttga 
tggaatctca 
agcacattat 
gcttatctct 
atgctgacat 
gttggagaat 
aagccctcca 
tgtgtattct 
ttaaagcata 
acataaattt 



ccccaactgc 
ccctgccaga 
acctgatcag 
ctgtagaact 
tcttaccagt 
tgaagatgaa 
aaaacataaa 



gcaagatgag 
atctctattt 
ggctgatgaa 
tcggataaat 
gttgttttgt 
agaagaaact 
agtggacata 
taacctaaga 
tgtgaaattt 
ccaggcttac 
tttagagaaa 
gttggcaaaa 
ttgttctttt 
tctttttcag 
gacaggcagg 
tttagatggt 
atttgtactc 
aaatgtccta 
cttctttgcg 
tattgaagtt 
tattcaaatg 
gctaacctct 
tcaggaactt 
ggtaccctca 
gtatagaagt 
caaatggaaa 
cctgcctgac 
tcctgtgatg 
tttgaggaac 
tgatataaaa 



acgcggaaga 
tgccagaagt 
ctaaataaac 
agtccttata 
catttgaaca 
atcaggacac 
gaaaccaaca 
gacattttac 
gaaatcttga 
atcccagaag 
tctggtgaag 
tcaaaaacac 



ctcagggaag 
gcaggggaag 
gaggaattta 
cacactatga 
attgtctcta 
tatccccaag 
tcagtacctg 
ttccatcgat 
aacagtaaag 
catgatgctt 
ataaatagtg 
tgcagtgcag 
tcttttacaa 



gccggtagct 
tatcatgaat 
aaactccctg 
gagagttact 
aaagatatat 
gtcatgggac 
gacttaccca 
cacaggggga 
atcaagtttt 
aaggttgaga 
actttgacag 
cacgacctgg 



gcacgcagtc 
gggtggagaa 
attatcgatt 
ggacagttct 
acccacatag 
tgaaacagaa 
atagcaatgc 
ctctaaccct 
ttctgatggg 
gtctctttac 
aggttctgag 
agcagaggca 
tgagagactc 
agcacctacc 
taggcttcct 
taactgagaa 
gtggattttc 
ctatctacac 
ttatgggagt 
caccacaact 
aaaggggtaa 
ttgaaatttt 
acacaaatat 
tgtcagattt 
gagcctttgg 
tgactgcagt 
tttggtttga 
ggaaattccg 
ataaagaaac 
tctcagaaca 
aactcaaatt 
atcctgacac 
aagatataga 
ttcctaatgt 
atgagcggta 
accaaaggtc 
atttaatggt 



cgacttggcc 
ctgtaggaga 
atgtgccaaa 
tcgagataat 
tagacacaga 
aaaaattgat 
tcagaacccc 
tgaagagaag 
aaagcaaaac 
tccagataac 
aaagcggttt 
gatgctagag 
acacttcttt 
tgtgttggtg 
gccttatgaa 
gtggggatta 
ttccaaaatg 
actctgctct 
atctgttgca 
gcttttagaa 
agaactgaag 
agtggaactc 
tagatggaat 
tgatttcatt 
gaaaaacctc 
actgcattca 
ggaagccaca 
cagagctcac 
cctaagtgtc 
gcacctcaaa 
caatacgtcg 
gctgtcagct 
gcttccgtcc 
gtatgcattg 
tgaaaatgga 
aagtaacttg 
ggacacatat 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
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attaaactct atacaagtaa gtcagagctt cctacagata attccgaaac tgtggaaaat 2280 

2283 

<210> 259 
<211> 986 
<212> DNA 
<213> Mus musculus 

<400> 259 

cttctgctaa agcaaacccc acaacggaca gggtagtcac tcgcccaccc caacccccac 60 
SSSEES ggtg3tCgtC cccgtaactg ctgaccgacg ccaccgagag cggcgSJcgt 120 
tatcaaggcc gagcgcggga ccccgacggc ccccttcgcc tgcctcccgg qccoaaaaaa ifio 
agtgtggagg gccagaagga tggtgcagtc ctgctccjcc tacggctgca 240 
cgacaaggac aagcccgtct ccttccacaa gtttcctctt actlgccSca gXtSKa 300 
gcagtgggag gcagctgtta aaaggaaaaa cttcaagccc accaagtaca gc^gcaJrto 360 
£Z£KS cccacaS? ttcSS? -caaLagc ?Sc?gaagS JS 

a^ccLag^a 5 SgSccS SSgcttS S£Xg £££3X JJg 

SSSSSS afScaSf agaCCCCtga teac ^cg gtStSgg acScaatS 00 
o a f:2 g9ag ^tacgatgc accagaggaa gaggatcctg cagctggagc agcaggtgga 660 
gaaactcagg aagaagctca agacggccca gcagcggtgc cggcggcagg agagqJaqct 720 
cgagaagctc aaggaagtcg tccactttca gagagagaag gacgacgcg? cSSSSaa 780 

gatgtgttag tgggacaaga ctatacacct tcttttagcc tacatacagg agttcattta 900 

2 .« 

986 

<210> 260 

<211> 1515 

<212> DNA 

<213> Mus musculus 

<400> 260 

cS5ca?t?a tg ^ CCgagC fc ^cggcgcg gggtcgcctg cctcgtttgt 60 

S«fS acagaagctt gcttagcggg cagcgcctcc gaagtggcgt aaggtgqcqc 120 
aSacKS SSSXS f g ^ gaCGa ^tgcgccgc ggcgggctgt gc?gc?Jcct ISO 
a ^™ a ^ ca t ta ffatc agcttccaca ggtttccttt ggatcctaaa agaagaaaaq 240 
aatgggttcg cctggttagg cgcaaaaatt ttgtgccagg aaaacacact tttcttS? inn 
caaagcactt tgaagcctcc tgttttgatc taacaggacj aacccqJcqa cS22?S« l*n 
aal^l^ aaCCattttt gatttttgta cccatSaaa gtatTglll ScaagtcIJ 420 
ggaatcttct gaagacaaac aacagttttc ctccaactgg accatqtaat tSSSJ^- *on 
acggcagtca gcaagtactg cttgaacaca gttatgcctt XggalcXt a^aqScaa 540 
aaaaaaggat aattaaacta gaaaaggaaa tagcalgctt gaglaaJala ItoXSIt? loo 
acS^^ agaa ^ Ca ^ a gcaactcgaa ggtgga?caa Igccacgtgc SjgJgaSga 660 
gcttagaagc aagtaacatg ctacctaagg gcatctcaga acagatttta ccalctocct 720 
taagcaatct tcctctggaa gatttaaaaa gtcttgaaca agatcaacaa S2!«f™ iln 
tacccattct ctaaatgtaa aatggaagag actctctgca ScaagSX Stcaclcaq s!o 
aacccagtgc ccagctcctg ccgtccccac ccaccgcact ctgaclgtta c2?aS!tc 900 
aagtcctgca gttttacttg aagtagtagt gtcagtgtca ctctctggag acJqaSSaaS 960 
B ggg S aatcc aa tgacaagc ttgacaccga gcagaagtgc cttaca?gag ggtJacSJac loSo 
atJSaS 9 SESK? ggttt f gCt ctt ^ttttt taagctgctg ?£S2£S ™lo 
^;? a = a ^ g 9 ^gttcat aaaagtaaaa gcattccgca ccaaagctgg gatattacat 1140 
tctaaagaac atgtgaagta ggagctaact gcattaaata tgatdtaS IctactaSo 1200 
S^ gtat gaattaaatt attgggattg tggttgaaaa ttttatagal KtacXct ^260 
gggg J a = ggg ^ aa ggtttg tttctttgtt ttgttttgtt ttttgtc?tt tKagcSS 1^20 
tgtattttaa ctagtaaaag taaacttatc atggcctttt tttat sanaa I III , 

2ZS£2£ tct *T ta9 ^""^ S£22£ SSSSS iSS 

SXagalK SSf ««=«=,ga ctgggaaaac coagtgagtt atagtcaacg 1500 

1515 

<210> 261 
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<211> 1120 
<212> DNA 
<213> Mus musculus 

<400> 261 

gaggggcagt gggcccatct ccgagatgcc 
ccgctacagc agccgcagga agcagctcac 
gctgttgagg gagtgggtgc tcaacatcgg 
agtcatctgc tcggaacact tcagacccga 
cctgaaacac aatgctgtgc ccacggtgtt 
tgaggtgggg gctggtgggg acagctcagg 
tcagcctcca accccggaag gccccgtgca 
ggccacggag gccgctggcc tgcctgccag 
acagccgtct gatcacagtt atgccctttc 
tctcacactg aaggaaaaca agaggcttcg 
gcggaggaca tgtggccgcc tgagagccta 
acggccggca cagggaagct gagcctgagc 
ccttagcagg aagtggtgtt ctggcctgct 
aggtgccttg agagtgggat gggatgctgc 
actgcggagg caccgtccca ggtttcttgg 
gtgaccaaat gtgagccgtc acaaccccct 
attcttacag ccggtggggt ccttactgtc 
gggcaagggt ccccgtcagc ctgtatttct 
agatgtggaa taaatctttt gaagtctcca 

<210> 262 
<211> 558 
<212> DNA 
<213> Mus musculus 



gaagtcttgc gcggcccggc aatgctgcaa 60 
cttccaccgg ttccccttca gccgcccgga 120 
ccgggctgac ttcaagccta agcagcacac 180 
gtgcttcagc gcctttggga accgcaagaa 24 0 
cgcttttcag aaccccacag aggtctgccc 300 
gaggaacatg gacaccacac tggaagaact 360 
gcaggtctta ccagatcgag aagcaatgga 4 20 
ccctctgggg ttgaagaggc cccttccggg 4 80 
ggacttggat accctcaaaa aaaaactctt 54 0 
gaagcggctg aaagcccaga ggctgctgtt 600 
cagagaggga cagccgggac ctcgggccag 660 
aagctctggg atgtgggggt ggtggcaaca 720 
atgggcgttt ctacccgctg ctgatgctgc 780 
gacaggcagt tgtcgggtgg gggcccaagt 84 0 
gctgaggctg tcagctgtgg ggaagcagca 900 
caagagatgc tcccagaggg agagctggtc 960 
tccccatagg agccattctg atggcaggca 1020 
gagtgactct tttttctgcc * tggttcgtgt 1080 
aaaaaaaaaa 1120 



<400> 262 

atactgcaag catttggaag cctaaaaaaa 
aagacagact ttgacagaag cactctaaac 
atctttgaat gtccatatca cttacaggag 
ttccttctca aaacccttcc catcacccac 
attgaagaat tcgaacccca gttcattttt 
aagaagctta agcataagct agaccgtgtg 
ctacggaatg ttttagcccg agaaaaacac 
gaactaaagg atgaaagtct gatcagccag 
tgggagtgct atcatgaaag cacagcagga 
cttcatctgc agttgaca 

<210> 263 
<211> 37 
<212> PRT 

<213> Artificial Sequence 



ggagatgtgc tgtgttcaag acacttcaag 60 
actaagctga aggcaggagc catcccttct 120 
aaaagagaaa aacttcactg tagaaaaaac 180 
catggccgcc agcttgttgg tgcctcctgc 24 0 
gaacatagct acagtgttat ggacagccca 300 
atcatcgagc tggagaatac caaggaaagc 360 
tttcaaaagt cactgaggaa gacaatcatg 420 
gaaacagcca atagtctggg tgctttctgt 480 
ggctgtagtt gtgaagtcat ttcttatatg 54 0 

558 



<223> Consensus sequence for PAR 4 binding domain of THAP 
<221> UNSURE 

<222> 3-15, 18, 22, 24-34, 36 

<223> Xaa - any of the twenty amino acids 

<400> 263 

Leu Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin 

15 10 15 

Ara Xaa Arg Arg Gin Xaa Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

* 20 25 30 

Xaa Xaa Gin Xaa Glu 
35 
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